The Trees we are concerned with here are not balanced binary trees used for quick access to ordered data.
They are simple tree structures where each tree node may have zero or more child nodes. They are often used
as in-memory representations of tree structures that occur as artifacts of processing activities, i.e., parse trees
resulting from processing markup like XML or HTML.
Our interests focus on using M-ary Trees to support code analysis. We will use instances of a C++ template class MTree<T>, discussed here,
to capture the structure of code scopes1. For example, one measure of complexity of a function written in C, C++, Java, or C#
is the number of scopes the function uses for branching and looping to carry out its processing activities. This is
closely related to the well-know McCabe Cyclomatic Complexity metric.
MTree
Figure 1. - Code Scopes Tree Representation
MTree
Figure 2. - Code Scopes Fragment
Background:
The Spring 2014 CSE687 - Object Oriented Design class will, in their first project, build a code analyzer that, for each file it processes,
finds the size and complexity of all function definitions. We will use some of that code to build, in Project #2, a code similarity
detector. That would be useful for code refactoring: to find, for later replacement, duplicated code in a set of packages. The intent
is to replace duplicated code with invocations of an extracted function.
Eliminating redundancies avoids fixing bugs in multiple places or worse, fixing in only one of several places the bug occurs.
Our redundancy analysis seeks to find congruencies within the scope trees
of analyzed function definitions. So we will need a way to capture a representation of the these scope trees - enter the class MTree<T>.
Core Idea:
A function's scope tree is something like it's DNA. If two functions have scope subtrees that are identical in structure and the scope nodes
have similar line counts the behavior of those parts of the code may not be identical, but their processing is certainly similar. We've shown
in Figures 1 and 2 the code and scope tree of member function MTree<Node>::walk(Node*).
Each tree node has been annotated with the line count of the scope it represents. If we find, in some other function, a tree with similar structure
and line counts that becomes a candidate to examine for code duplication. We won't go into how we measure similarity - its not complicated, but
isn't relevant to a discussion of the M-ary Tree.
MTree
Figure 3. - MTree<Node> Package Classes
Design:
The MTree<Node> class and helper class, MNode<T>, are relatively simple template classes. The template parameter Node used by
MTree<Node> is expected to take the type MNode<T> where T represents the type of data held by each node. That may be a library type
like a string or some user-defined type that holds composite information.
Each instance of MNode<T> holds a vector of pointers to child nodes, created on the native C++ heap. The class provides methods to
add and remove child nodes and mark nodes for traversal. The MTree<Node> class provides a method to make an instance of its Node parameter
the tree root and assumes that the Node addChild method will be used to build out the tree.
Client programs interact with an instance of MTree<Node> in two ways. The using code may define a class derived from the Operation<Node>
class that defines what processing should occur when nodes are encountered on a depth first tree traversal. The class can provide accessor functions
to feed the operation, initializing data and collect results at the end of traversal.
Clients may also use the methods MTree<Node>::find() and MTree<Node>::finder(). find() simply returns with a pointer to a Node that
has a specified value, if it exists, else returns null. finder() uses an instance of finderOp<Node>, derived from Operation<Node>.
This nominally does the same thing, except that finder() has more control over how the search is implemented.
MTree
Figure 4. - MTree<Node> Output
Typical Output:
Demo output is presented in Figure 4, which shows test code traversing the MTree and using find and finder. Also shown is the removal
of an MTree<Node> node, making copies of the tree and assigning to other tree instances.
Source Code:
This M-ary Tree Demo is written in C++ using Visual Studio 2012 and compiles and runs using Visual Studio 2013 as well.
You will find the code here:
The MTree<Node> package demonstrates how you can use C++ to build something close to an Abstract Data Type (ADT). We haven't made an
effort to make this container compatible with the algorithms in the STL. That wouldn't be too hard to do, but the package isn't widely applicable
outside of the way we will be using it, so we chose not to do that.