Blog: MTree

linked m-ary tree structure

About

click to toggle Site Explorer

Initial Thoughts:

The Trees we are concerned with here are not balanced binary trees used for quick access to ordered data. They are simple tree structures where each tree node may have zero or more child nodes. They are often used as in-memory representations of tree structures that occur as artifacts of processing activities, i.e., parse trees resulting from processing markup like XML or HTML.

Our interests focus on using M-ary Trees to support code analysis. We will use instances of a C++ template class MTree<T>, discussed here, to capture the structure of code scopes¹. For example, one measure of complexity of a function written in C, C++, Java, or C# is the number of scopes the function uses for branching and looping to carry out its processing activities. This is closely related to the well-know McCabe Cyclomatic Complexity metric.

Figure 1. - Code Scopes Walker

Figure 2. - Code Scopes Fragment

Figure 3. - MTree<Node> Package Classes

Figure 4. - MTree<Node> Output

Background:

The Spring 2014 CSE687 - Object Oriented Design class will, in their first project, build a code analyzer that, for each file it processes, finds the size and complexity of all function definitions. We will use some of that code to build, in Project #2, a code similarity detector. That would be useful for code refactoring: to find, for later replacement, duplicated code in a set of packages. The intent is to replace duplicated code with invocations of an extracted function.

Eliminating redundancies avoids fixing bugs in multiple places or worse, fixing in only one of several places the bug occurs. Our redundancy analysis seeks to find congruencies within the scope trees of analyzed function definitions. So we will need a way to capture a representation of the these scope trees - enter the class MTree<T>.

Core Idea:

A function's scope tree is something like it's DNA. If two functions have scope subtrees that are identical in structure and the scope nodes have similar line counts the behavior of those parts of the code may not be identical, but their processing is certainly similar. We've shown in Figures 1 and 2 the code and scope tree of member function MTree<Node>::walk(Node*).

Each tree node has been annotated with the line count of the scope it represents. If we find, in some other function, a tree with similar structure and line counts that becomes a candidate to examine for code duplication. We won't go into how we measure similarity - its not complicated, but isn't relevant to a discussion of the M-ary Tree.

Design:

The MTree<Node> class and helper class, MNode<T>, are relatively simple template classes. The template parameter Node used by MTree<Node> is expected to take the type MNode<T> where T represents the type of data held by each node. That may be a library type like a string or some user-defined type that holds composite information.

Each instance of MNode<T> holds a vector of pointers to child nodes, created on the native C++ heap. The class provides methods to add and remove child nodes and mark nodes for traversal. The MTree<Node> class provides a method to make an instance of its Node parameter the tree root and assumes that the Node addChild method will be used to build out the tree.

Client programs interact with an instance of MTree<Node> in two ways. The using code may define a class derived from the Operation<Node> class that defines what processing should occur when nodes are encountered on a depth first tree traversal. The class can provide accessor functions to feed the operation, initializing data and collect results at the end of traversal.

Clients may also use the methods MTree<Node>::find() and MTree<Node>::finder(). find() simply returns with a pointer to a Node that has a specified value, if it exists, else returns null. finder() uses an instance of finderOp<Node>, derived from Operation<Node>. This nominally does the same thing, except that finder() has more control over how the search is implemented.

Typical Output:

Demo output is presented in Figure 4, which shows test code traversing the MTree and using find and finder. Also shown is the removal of an MTree<Node> node, making copies of the tree and assigning to other tree instances.

Source Code:

This M-ary Tree Demo is written in C++ using Visual Studio 2012 and compiles and runs using Visual Studio 2013 as well. You will find the code here:

MTree<Node> Demo Code

Conclusions:

The MTree<Node> package demonstrates how you can use C++ to build something close to an Abstract Data Type (ADT). We haven't made an effort to make this container compatible with the algorithms in the STL. That wouldn't be too hard to do, but the package isn't widely applicable outside of the way we will be using it, so we chose not to do that.

The Global Data Blog entry has a fairly complete discussion of code scope structure.