Philosophical discussion of a node

Aristos Queue · May 1, 2007

I have this weekend been studying a most fascinating document, analyzing the nature of object-oriented programming vis-a-vis Aristotle's conception of categories. The analysis leads to some very concrete conclusions. It is somewhat amazing to me that I've never stumbled across this sort of argument prior to this weekend. Thank you to AdamRofer for inspiring this Internet hunt.

What follows is a single paragraph from the paper. I'd like to hear discussion on the pros/cons of the two definitions of "node" mentioned here:

Vlissides gives an example of exceptions (again in the programming language sense) being used to express the categorical negative proposition file nodes cannot contain other nodes [33]. His example models a hierarchical file system, which can be expressed by the following categorical statements: files are nodes; directories are nodes; a node may contain other nodes; file nodes are not nodes that can contain other nodes. He discusses the trade-offs between replacing the last two statements with directory nodes may contain other nodes, or keeping them as stated.

Personally, I find the block of gray text annoying in this day and age of font control and formatting options. Also, there's a typo in the paragraph, too, that you have to deal with. So let me restate the problem in a way that might be more readable:

When programming a file system, one might consider two different class hierarchies:

A 'node' is an element of a file system. A node

may

contain other nodes. 'Files' are nodes; 'directories' are nodes. Files are those nodes that choose to not include further nodes. Directories are those nodes that chose to include further nodes.
A 'node' is an element of a file system. A 'file' is a node. A 'directory' is a node with the ability to include other nodes within itself.

Which of these two class hierarchies is the better choice and -- here's the important part -- why? Should the ability to include other nodes be a capacity of the node class itself or a capacity only of the directory class? What are the ramifications between them? If we say that "file is a node" and "directory is a node", which definition of these classes best reflects the implementation?

Side note: This is but one point of discussion highlighted by the document. Honestly, if I had read this three years ago, LabVIEW classes might have a very different implementation!

Tomi Maila · May 1, 2007

QUOTE(Aristos Queue @ Apr 30 2007, 07:56 AM)

When programming a file system, one might consider two different class hierarchies:

A 'node' is an element of a file system. A node
may
contain other nodes. 'Files' are nodes; 'directories' are nodes. Files are those nodes that choose to not include further nodes. Directories are those nodes that chose to include further nodes.
A 'node' is an element of a file system. A 'file' is a node. A 'directory' is a node with the ability to include other nodes within itself.

Which of these two class hierarchies is the better choice and -- here's the important part -- why? Should the ability to include other nodes be a capacity of the node class itself or a capacity only of the directory class? What are the ramifications between them? If we say that "file is a node" and "directory is a node", which definition of these classes best reflects the implementation?

This discussion is related to my earlier posts on problems with trees and graphs in LabVOOP. As a theoretical physicist from my education, I'm a very practical person. So I like to concentrate on practical implications of programming models and not so much on philosophical considerations. But I also like elegant programming language constructions, elegant meaning something that allows simple solution to a wide range of problems. From philosophical point of view both answers to your question are equally good. But from elegant programming point of view the alternative 2. "A 'node' is an element of a file system. A 'file' is a node. A 'directory' is a node with the ability to include other nodes within itself." is more elegant.

Why? There are two issues that make the second alternative superior to the first one; simplicity and and extendability. In general people are not very good at handling complexities. Delay of Windows Vista is an excellent example of what happens when things get too complicated; people just don't handle it anymore. To handle complex issues they need to be divided into logical pieces that are easy to comprehend. This is what abstraction in programming is all about, dividing the program achitecture into managable pieces. The question AQ is actually asking is if the complexity should be visible to abstraction layers where it could logically belong to but where it doesn't necessarily need to belong to. The answer to this question from the complexity point of view is not so simple if we only think of files and directories. But assume that the nodes in a graph or tree could also have some other properties in addition to being able to contain other nodes. Should we expose all of those properties to our generic node or should we leave our generic node totally unaware of all these properties. I don't think we should. This would make the generic node complex and filled with properties that doesn't necessarily need to be there. We would loose most of the benefits of abstraction making things simple to understand.

Second issue supporting the second alternative is extendability of the code. If we take the view that the specific properties such as "ability to contain other nodes" are properties of generic classes such as node class, then we make it harder to extend our classes later on. If we later on decide to add a new property similar to "ability to contain other nodes" to nodes, we would need to do changes to the generic class level. On the other hand if we from the start design our class relationships so that only the absolutely necessary functionality is exposed at each hierarchy level, we end up with more extendable software architecture.

Perhaps I give a practical example.

Consider a parse tree of a mathematical expression of the form +(2,*(3,5)). The root node in this tree is +, which has subnodes of 2 and *. The latter * has two subnodes 3 and 5. Indeed we can express almost any mathematical expression as a tree of similar form.

+
- 2
- *
  - 3
  - 5

Now the question AQ is asking would be if each mathematical expression should be aware that any mathematical expression can be constructed from subexpressios. Of course pure numbers in our example cannot have subexpressions, so for numbers this information would be irrelevant. Indeed only functions can have subexpressions or perhaps we should call them function arguments. Now if we adopt the point of view that each node should be subexpression aware, then we need to ask, should each node be aware also of some other specific features. Should each node aware of the if this particulae node is a function? Should each node be aware if this particular node is a number? Should each node be aware if the particular node can be converted to string? I guess you notice this is an never ending story. Eventually you have to limit the properties that are exposed to all of the nodes but that are not generic in a way that all nodes have these properties.

I think a better way is to delegate the functionality to nodes. For example consider an evaluation of the above expression. Our root node is +, which is a function of two expressions. We make a call to "+.evaluate". This function delegates the job the arguments of this particular + instance, namely to number 2 and to another function *. "2.evaluate" simply returns the value 2. "*.evaluate" delegates the evaluation process further to numbers 3 and 5. This recursive process continues until the expression is evaluated. In a similar manner it's possible to implement different kinds of functionality such as "toString" etc. Eventually most of the functionality can be delegated further.

The delegation scheme is not available in LabVIEW, as LV doesn't allow recursive dynamic dispatch method calls, not at least yet.

There is a little ambiguity in the question. I may have misunderstood what was being asked.

QUOTE(Aristos Queue @ Apr 30 2007, 07:56 AM)

Side note:

This is but one point of discussion highlighted by the document. Honestly, if I had read this three years ago, LabVIEW classes might have a very different implementation!

What would LabVIEW Object-Oriented Programming have been like should you have read the document?

Doon · May 1, 2007

Hiya,

I hate to follow Tomi Maila's logic with dribble, but here goes:

We always want our children to be better than ourselves. That means to say that child nodes should contain enhancements beyond their parents. I would call it "specialization through extension".

I agree with Tomi on the point of simplicity. One would not want to add functionallity to a node by adding that functionality to generic node and masking it to the one child.

I'll bet that there are cases where the opposite is true. This file/directory question had me going back-and-forth for a while. It gets even more complicated when one considers examples like the Unified File Systems (of the GNU/Linux variety) in which "everything is a file".

[/two cents]

--H

Aristos Queue · May 1, 2007

QUOTE(Tomi Maila @ Apr 30 2007, 07:53 AM)

What would LabVIEW Object-Oriented Programming have been like should you have read the document?

I'm not sure yet. Among the points of consideration:

a) Documentation. Should we have used "inheritance" as the best term for expressing the relationship between the super class and the sub class? We chose it for its accessibility to users without a programming background, but the article contends the metaphor breaks down in enough places that without an a priori understanding of inheritance, the metaphor isn't helpful to those who are trying to learn its meaning. Here's a paragraph from the document that talks about some of the problems with the term:

The term ‘inheritance’ is often used in a metaphorical fashion in ob ject-oriented programming to describe the relation between a sub-class and a super-class. Here we compare the meaning of the word inheritance with five kinds of transference relations: that of a prototype and its imitation; incremental modification; the Darwinian relation of evolution between species; the Aristotelian relation of logical abstraction between a species and a genus; and the Aristotelian relation of logical abstraction between an individual and a species. We include the incremental modification relation here because it has been said to be the ‘essence’ of inheritance (or, more precisely, incremental modification in the presence of a late-bound self-reference [35, 32]). We find that ‘inheritance’ seems to be an acceptable metaphor for the first three relations, but that it is an exceedingly poor metaphor for the Aristotelian relations of logical abstraction.

b) Proving the correctness of an object hierarchy. Here's a simple case: Class "Person" has two children classes "Female" and "Student". There's a problem with these classes -- they do not divide exclusively the set of objects in class Person. Situations like this do arise in code, particularly when two developers are both creating sub classes. Could the way we establish the inheritance relationship -- through the UI -- be modified to check that all child classes of a given parent use the same discriminator test?

c) The entire discussion about the critical need for all super classes to be abstract. This is one where I had treated it as "a good idea" to have abstract base classes and only instantiate at the leaf levels. The paper includes a proof that it is possible to have all possible hierarchies follow this theme and, moreover, a proof that doing so increases the logical correctness of code. See the section discussing "Is a Square a Rectangle or is Rectangle a Square"?

I spent part of last week sitting in on a Software Engineering capstone course at the University of Oklahoma. The students were working in a language called ACL2, which allows for them to not only code their solution but also to use an automatic theorem prover to prove that a given function is correct for all possible inputs, which blows the socks off of any empiric test suite which cannot possibly cover all possible test cases. This document raises many points for consideration that the syntax used to express the super-to-sub relationship can go a long way to proving the correctness of the code overall. A fascinating area to consider.

Tomi Maila · May 2, 2007

QUOTE(Aristos Queue @ Apr 30 2007, 09:35 PM)

I'm not sure yet.

AQ, was the LabVOOP project at NI just a regular project among other new functionality projects or did you leverage external programming language research knowledge by including related research groups or research scientists to the concept planning phase of the project?

Tomi

Aristos Queue · May 2, 2007

QUOTE(Tomi Maila @ May 1 2007, 02:41 AM)

AQ, was the LabVOOP project at NI just a regular project among other new functionality projects or did you leverage external programming language research knowledge by including related research groups or research scientists to the concept planning phase of the project?

There weren't any external-to-NI researchers on the team. The published design papers from many such researchers were used, and various members of the LV team have been hired because they were themselves knowledgeable about language design.

Sign In

Philosophical discussion of a node

Recommended Posts

Aristos Queue

Tomi Maila

Doon

Aristos Queue

Tomi Maila

Aristos Queue

Join the conversation

Browse

Activity

Important Information