Blog: Code Analyzer

packages, activities, output

About

click to toggle Site Explorer

Initial Thoughts:

Code analysis is an interesting topic that takes us into many of the areas covered by our course sequence, especially CSE681 - Software Modeling and Analysis, CSE687 - Object Oriented Design and CSE776 - Design Patterns. Before reading this blog you may wish to review the Code Parsing Blog, as parsing is the most important and complex activity for the Analyzer. I'll assume that you are familiar with that material and not repeat those discussions here.

The Mission of the analyzer is to compute size and complexity metrics for all the source code files residing in the directory tree rooted at a specified path. It can also show you the Abstract Syntax Tree (AST) it built to hold the analysis results, and show the files analyzed with their line counts. Another goal of this design is to provide a structure that can easily be extended to other kinds of analysis, e.g., finding dependencies between all files in the working set, or identifying some of the flaws in a design.

The design and implementation come in two parts. The first part is a console application written in C++, using some of the features of the latest C++11 standard. It takes all of its inputs from its command line and writes its output to the console and also, if that option is chosen, to a log file deposited at the root of the analysis path. This first part implements the entire analysis functionality of the Analyzer.

The second part is a Graphical User Interface, written in the managed .Net language C++/CLI, and uses the Windows Presentation Foundation (WPF) framework¹. The purpose of the GUI is to support easily browsing to the root directory for analysis and to collect user settings. The path and settings it builds into a command line and then starts the Analyzer console application process with that.

The implementation was divided into these two parts because browsing for a starting path is much easier with a GUI using an OpenFileDialog² than from the command line with a lot of CDing to navigate. However, a console is a better place to send a lot of information which needs space and flexibility to convey a lot of resulting information.

In this discussion we look at the package structure of the analyzer, its activities, and the output it generates. At the end we will draw some conclusions about what makes this design interesting and areas for improvement.

Packages:

The Code Analyzer package structure is shown in the figure at the right. The packages focus on three areas:
- Building a List of Source Code Files:
  
  The Window package is used to identify the root directory for the analysis and to pick file patterns for the language we want to analyze, e.g., *.h, *.cpp, or *.cs.
  
  FileMgr package constructs a list of fully qualified file names that match specifications provided by the window package. It uses FileSystem package to query the file system to find child directories and files³ using operating system APIs (we have implementations for both Windows and Linux). It does its work by recursively calling its search function with the root and then discovered child directories.
- Parsing each Source Code File:
  
  The Parser package is responsible for analyzing a source file based on rules defined in the ActionsAndRules package. Parser collects its input from the scanner packages Tokenizer and SemiExp. That results in a list of tokens that has just enough information to make a decision about grammatical constructs without having to save left over information for the next round. What that last sentence means is that the token sequence has just enough tokens to search for matches in the rule collection.
  When a rule matches it invokes its actions. It is the actions that build the Abstract Syntax Tree (AST), using help from the Scope Stack. That process continues until all of the files have been analyzed.
  The AbstrSynTree and ScopeStack packages provide all the mechanics for building the AST. The AST serves as a container for all information that results from parsing all the selected files.
- Displaying Results:
  Each of the displays is build by the Display package from information contained in the AST. This package provides functionality to build:
  - Metrics Display which shows, for each function, size in line count and complexity as measured by the number of descendent scopes in the function.
  - Abstract Syntax Tree visualization, an indented list of all the nodes in the AST.
  - SLOC list, showing each file and its source line count.
  The logger package is used by display to write to the console and to a log file. It has three different kinds of output, using three different specialized loggers:
  - Results mode - used to show normal output
  - Demonstration mode - used to help users understand how the program works, mostly by looking at rule firings and the resulting actions
  - Debug mode - used to look at parsing details to help make the parsing detections and actions correct.
  Analyzser code has statements for each of these three specialized loggers. The user just turns them on and off to see the various types of output. That is done with checkboxes in the GUI Display Mode tab.
Activities:

Activities are separated, in the activities diagram at the right, into two rows. The top row shows GUI activities and the bottom shows the Analyzer activities.
- GUI Window Activities:
  Start sequentially, but may enter a loop where the user makes settings, runs the anlyzer and looks at the analysis output. Then the user may select new settings, a new path for example, and runs the analyzer again.
  - Create Views:
    
    This first activity builds the views programmatically⁴. There are three: Execution, Setup, and Display Mode views. Each view provides the contents for one of the GUI window tabs.
  - Retrieve User Settings:
    
    This activity reads a text file with the last set of user settings and populates controls on the window accordingly, e.g., checking checkboxes and writing paths into textboxes.
  - Respond to User Actions:
    
    Here, the GUI responds, via its event handler functions, to button clicks and changes in checkboxes and text in textboxes. For example, clicking on checkboxes or changing text in a textbox changes the member data of the Window class.
  - Create CodeAnalyzer Command Line:
    
    A string that represents the console application's command line is built using Window member data that was determined by extracting information from check boxes and textboxes in the previous activity.
  - Start CodeAnalyzer Process:
    
    In response to a "Start Analysis" button click the button's event handler prepares information to start the console application analyzer, passing it the command line prepared by the previous activity.
  - Save User Settings:
    
    At the end of an analysis the user settings are saved to a text file located in the same directory as the GUI and Analyzer execution images. At this point the GUI is idle until the user either clicks the kill button to terminate or makes changes to the settings to run another analysis.
- Code Analyzer activities:
  are all sequential in the large, although there are many internal loops, not shown here, that run during file analysis and parsing.
  - Process Command Line:
    
    The CodeAnalyzer starts by processing its command line. That may result in immediate termination if needed settings have been omitted. That won't happen when using the GUI, but might when users run the CodeAnalyzer from a command line. That hasn't been shown in the diagram because we show the GUI driven activities. If the command line has all the needed information, processing moves to the next activity.
  - Find Source Code Files:
    
    The CodeAnalyzer executive passes the path and file patterns to the FileMgr package for this activity. That results in a recursive descent through the directory tree rooted at the specfied path, looking for all the files that match the given patterns. That information flows into the next activity.
  - Parse Source Code:
    
    For each file in the file list the parser collects SemiExps and passes them to each of its rules until a match occurs. The match results in calls into the AbstrSynTree package to add another node to the AST or add information to an existing node on the top of the Scope Stack.
  - Build AST & Record Lines:
    
    The loop between "Parse Source Code" and "Build AST & Record Lines" is traversed for each rule detection. This looping continues from the start of the first file analysis to the end of the last. If the rule is for the beginning of a scope, a new node is pushed onto the Scope Stack. If the rule matches an end of scope event then the node is popped off the stack. For all other rules information in the node at the top of the stack is modified.
  - Complexity Analysis:
    
    When we get to this activity all the AST nodes have been added to the tree. Complexity Analysis just walks the AST and for each namespace, class, struct, or function node, records the number of descendent scope nodes. That, of course, has to happen just before we pop back to a parent node during the tree walk, at which time we record the complexity information in the current node. When we leave this activity all of the analysis information has been stored in the tree.
  - Display Metrics:
    
    The Display package gathers Metric information by walking the AST, collecting node name, type, starting line, line count, and complexity. That is then formatted into a table and written out using the logger.
    Actions store data declarations encountered in some scope in the AST node for that scope. They keep track of whether the data has public, protected, or private access specified. That information is stored along with the data declaration in the AST node for that scope. This information is shown in the metrics display and also used in the last activity of the CodeAnalyzer before termination.
  - Display SLOCs:
    
    While building the AST actions store size information in a table with filename keys. That is enumerated by the Display package to provide this display.
  - Display AST:
    
    This display is generated by the Display package by walking the AST tree, extracting information, and displaying with an indentation that is proportional to the depth of the current node in the tree.
  - Display Metrics Summary:
    
    This activity is very similar to the Metrics display processing except that the only information displayed is that which exceeds limits for function size and complexity. Also shown are all of the public data defined anywhere in the AST.
Output:

You see, in the figure at the right, a typical output for VisualCodeAnalyzer execution. The user has selected both C++ and C# file analysis, browsed to the root folder of the CodeAnalyser Solution, selected metric display, and started the CodeAnalyzer. The command line paramters for the console application are shown at the top of the display, along with the date and time of execution.
It is convenient to have the controls and output in separate windows. We can look at the console output, decide to change some execution parameters, do that in the GUI window while still observing the output, and run again. One other point - one way communication from the GUI application to the console is a lot eaiser to set up and manage than two way communcation between the executing code and GUI displays.
Notice that public data is shown in the console window just below the class or struct that owns that data, and the display also localizes it to a particular package. In this application the public data are members of a couple of structs that are strictly private to the implementation. They are never returned from functions that another code author might have to deal with. I treat public data from classes, and any construct that other code has to use, as an error of design. I do not do that for data held in private structs.

Summary

Here are things I like about this design:

The combination of GUI and Console Application. GUI for browsing and setting parameters combined with a console analysis application for execution of the analysis is simple, works well, and is very usable.
The structure is fairly simple for processing as complex as code analysis. We've built something close to a compiler front end. It is of course simpler, because we need to recognize only a small part of the C++ and C# languages. It's also surprising how much code is common for the analysis of both C++ and C# code.
The individual parts are all recognizable by name and function and they distribute the program's complexity fairly uniformily among themselves.
Not much Need to Change:
The structure is such that other applications like dependency analysis will keep almost all the packages intact, only modifying a few, like Window, ActionsAndRules, and Display, and those modifications will be small. Almost all the other of the fifteen packages will not need to change.

Here are things I don't like about the design:

Processing is not concurrent.
Analysis of each file is independent of that for every other file. The only thing that is shared is use of the Astract Syntax tree and scope stack.
- That means that we could make the expensive parsing part concurrent, provided that we let each analyzer thread have its own Abstract Syntax Tree and Scope Stack. We just run each file's analysis on a thread pool thread.
- That means that we have to build mechanics to merge the ASTs for each file, but that is close to trivial to accomplish.
- We would also have to construct a parser for each file because the parser is welded to its scanner data source and that could work only on a single file at a time. It turns out that is also easy to do. In fact some of my parser demos do just that.
So making this application concurrent should be relatively easy to do. I just haven't done that yet.
It's incomplete.
I haven't gotten around to trying on Java code. I expect that since it works on C# it's very likely to need next to no changes for Java.

The C++/CLI compiler tool chain does not have a Xaml processor so all of the WPF functionality needs to be implemented with code - no declarative implementation.
You might thing we would use a FolderBrowserDialog for selecting directories. I did not for two reasons. The first is that the FolderBrowserDialog control doesn't work very well. It does not scroll down to the selected path when you open it. You have to manually scroll and that gets to be a pain. The second reason is that sometimes we want to select specific files to process, not all those matching a pattern. For that you need the OpenFileDialog.
C++ surprisingly does not have a directory manipulation library, so I wrote one a couple of years ago and use that here.
We build views programmatically because we can't do that declaratively with C++/CLI. See footnote 1.

Initial Thoughts:

Packages:

Activities:

Output:

Summary

Here are things I like about this design:

Here are things I don't like about the design: