Jump to content

Question on AQ's ancient linked list implementation


Recommended Posts

I've been exploring data structure implementations lately. At the same time I've been trying to reduce my use of references and maintain data flow as much as possible. Using Brian's Linked List code as a starting point I hacked around for a while until I got something that works but takes far too much memory to traverse the list. Turns out I was following a path AQ blazed 3 years ago. (You can get his source code from there.) Except I didn't think of using the Swap Values prim and have struggled for quite some time trying to eliminate the data copies.

This is the relevant part of AQ's post explaining the code:

----------------------------

This is a linear list of data, where it is easy to append to the start of the list without ever reallocating all the existing entries. To access items in the list, you have to traverse down the list -- there is no direct access the way there is with an array. I've gone with a very flat hierarchy for implementing this example so that the bare minimum number of VIs exist.

Just as with the Map, the key bit of magic is a child class that uses its parent class in the private data cluster. In this case, the parent class is Root and the child class is Node.

There are only four operations currently defined on the list:

  • "Dump to string" allows the list to be shown as a string for debugging (so you can evaluate whether it is working right or not)
  • "Insert at front 1" is one implementation of inserting at the front of the list
  • "Insert at front 2" is a second implementation of inserting at the front of the list; compare the two implementations
  • "Insert at index" walks down the list to a given index and performs an insert at that point

If you open Demo.vi you will see two link lists, one being inserted using version 1, the other being inserted using version 2.

Ok... so let me try to deal with a long list of questions that have been raised by the Map class. To begin with, the Swap block. Here are the block diagrams for LinkedList.lvlib:Node.lvclass:InsertAfter1.vi and LinkedList.lvlib:Node.lvclass:InsertAfter2.vi. (I used red text so you could see there was actually a difference in those two long names!)

[see images below. I can't attach them inline. -Dak]

In each picture, there's a point marked "A". In version 1, at point A, we make a full copy of the entire list. The information in the cluster is being moved out of one cluster and into another cluster. So LV decides there has to be a copy made so that the two clusters aren't sharing the same data. This defeats the purpose of the LinkedList which is supposed to do inserts without duplicating the list. In version 2, at point A, we use the Swap primitive, new in LV8.5. We take a default Root object and swap it with the contents of Next. Now the contents of the Unbundle are free to be put into a new cluster without duplicating all that data.

-------------------------

My question has to do with "the key bit of magic" AQ was referring to. Why does there need to be two node classes? Was it simply a way to trick Labview into doing recursion before recursion was available? Now that recursion is available, shouldn't I be able to replace the Root constant with a Node constant and remove the Root class entirely? I've attached code where I've done that but Process Explorer and the Desktop Execution Trace Toolkit are both telling me my Insert routines are creating copies. Any ideas?

post-7603-001678500 1283660667_thumb.png

post-7603-061144900 1283660683_thumb.png

DataStructure-LinkedList.zip

Link to comment
Why does there need to be two node classes?
You will always need two classes for this to work.

There's a difference between function recursion and data recursion. The first is a function that invokes itself. The second is a data structure that is self-similar as you proceed through it. LV added function recursion, but that changes nothing about how you define the data types. The data recursion is what we're working with here. A class -- in *any* programming language -- cannot include itself. Every time you tried to instantiate the object, it would instantiate itself, and then another copy inside that, and another copy inside that and... boom, infinite data allocation. No can do. To do data recursion, you need two data types. If you create such structures using references, you can do it with two types -- the class and a reference to that class: the class contains a reference to itself. When the class allocates, it allocates itself and a NULL reference, which does not immediately allocate further. LV has a restriction to prevent this -- we don't allow the class to ever refer to itself, whether in a reference or directly. It's an overly broad restriction, logically speaking, but it dramatically simplifies a whole lot of parts of the back end systems -- we felt it was an acceptable trade-off, and we recognize there will be many customers who disagree. Regardless, what I'm showing in my Linked List impl is a by value data structure that is recursive. You still need two types. The parent class does not have a recursive definition. The child class includes the parent. Thus, when you instantiate the child, it allocates a parent, but the value does not immediately go any further than that.

Link to comment

You will always need two classes for this to work.

I wasn't very clear with my question. Someday I'll learn not to post when I'm tired. (He says, noticing it's after 10 pm...)

I understand why classes are not able to include themselves as part of the definition. My question was intended to contrast your implementation with an implementation that replaces your Root class with LVObject in the Node class definition. In my LVObject implementation the Node object will still contain another Node object, the same as it does in your version. The difference is that mine is held in a LVObject field and yours is held in a Root field. I'm not understanding why your version doesn't make data copies and mine does when functionally they are the same.

Link to comment

I spent a couple hours instrumenting your code and profiling memory use with the DETT. Assuming I'm interpreting the trace correctly (that's a big assumption) it looks like your code is also creating copies during the Node:InsertAtIndex method and the Node:DumpToString method. I'm guessing it's because those VIs are essentially recursive?

[see the diagrams below for my guess at how memory is allocated at runtime for object composition. I extrapolated from the diagram in The Decisions Behind the Design.]

If that's the case, there are still some things I don't quite understand. When classes are used in a vi, the vi dataspace contains a class pointer, not the class object. Even if an entirely new VI dataspace is created, it doesn't necessarily mean the entire data tree needs to be duplicated.

The "Recursive Node Object" diagram shows the memory during the initial call to a recursive VI ("WalkList.vi"), which was passed the Node 1 object. If we execute a GetNextNode (retrieving Node 2) and recursively pass that into WalkList, wouldn't the dataspace for the new clone simply contain a duplicate of the Node 2 pointer rather than duplicating the entire data hierarchy starting with Node 2? And additional recursive calls contain copies of Node pointers at each level until the GetNextNode call returns a LVObject, at which point the recursion rolls back up. No?

post-7603-068307700 1283843413_thumb.png

post-7603-095390000 1283843419_thumb.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.