about
Blog: Code Analyzer
11/28/2024
Blog: Code Analyzer
packages, activities, output
About
click to toggle Site Explorer
Initial Thoughts:
-
Packages:
-
Building a List of Source Code Files:The Window package is used to identify the root directory for the analysis and to pick file patterns for the language we want to analyze, e.g., *.h, *.cpp, or *.cs.
FileMgr package constructs a list of fully qualified file names that match specifications provided by the window package. It uses FileSystem package to query the file system to find child directories and files3 using operating system APIs (we have implementations for both Windows and Linux). It does its work by recursively calling its search function with the root and then discovered child directories. -
Parsing each Source Code File:The Parser package is responsible for analyzing a source file based on rules defined in the ActionsAndRules package. Parser collects its input from the scanner packages Tokenizer and SemiExp. That results in a list of tokens that has just enough information to make a decision about grammatical constructs without having to save left over information for the next round. What that last sentence means is that the token sequence has just enough tokens to search for matches in the rule collection. When a rule matches it invokes its actions. It is the actions that build the Abstract Syntax Tree (AST), using help from the Scope Stack. That process continues until all of the files have been analyzed. The AbstrSynTree and ScopeStack packages provide all the mechanics for building the AST. The AST serves as a container for all information that results from parsing all the selected files.
-
Displaying Results:Each of the displays is build by the Display package from information contained in the AST. This package provides functionality to build:
- Metrics Display which shows, for each function, size in line count and complexity as measured by the number of descendent scopes in the function.
- Abstract Syntax Tree visualization, an indented list of all the nodes in the AST.
- SLOC list, showing each file and its source line count.
- Results mode - used to show normal output
- Demonstration mode - used to help users understand how the program works, mostly by looking at rule firings and the resulting actions
- Debug mode - used to look at parsing details to help make the parsing detections and actions correct.
-
-
Activities:
Activities are separated, in the activities diagram at the right, into two rows. The top row shows GUI activities and the bottom shows the Analyzer activities.-
GUI Window Activities:Start sequentially, but may enter a loop where the user makes settings, runs the anlyzer and looks at the analysis output. Then the user may select new settings, a new path for example, and runs the analyzer again.
-
Create Views:This first activity builds the views programmatically4. There are three: Execution, Setup, and Display Mode views. Each view provides the contents for one of the GUI window tabs.
-
Retrieve User Settings:This activity reads a text file with the last set of user settings and populates controls on the window accordingly, e.g., checking checkboxes and writing paths into textboxes.
-
Respond to User Actions:Here, the GUI responds, via its event handler functions, to button clicks and changes in checkboxes and text in textboxes. For example, clicking on checkboxes or changing text in a textbox changes the member data of the Window class.
-
Create CodeAnalyzer Command Line:A string that represents the console application's command line is built using Window member data that was determined by extracting information from check boxes and textboxes in the previous activity.
-
Start CodeAnalyzer Process:In response to a "Start Analysis" button click the button's event handler prepares information to start the console application analyzer, passing it the command line prepared by the previous activity.
-
Save User Settings:At the end of an analysis the user settings are saved to a text file located in the same directory as the GUI and Analyzer execution images. At this point the GUI is idle until the user either clicks the kill button to terminate or makes changes to the settings to run another analysis.
-
-
Code Analyzer activities:
are all sequential in the large, although there are many internal loops, not shown here, that run during file analysis and parsing.-
Process Command Line:The CodeAnalyzer starts by processing its command line. That may result in immediate termination if needed settings have been omitted. That won't happen when using the GUI, but might when users run the CodeAnalyzer from a command line. That hasn't been shown in the diagram because we show the GUI driven activities. If the command line has all the needed information, processing moves to the next activity.
-
Find Source Code Files:The CodeAnalyzer executive passes the path and file patterns to the FileMgr package for this activity. That results in a recursive descent through the directory tree rooted at the specfied path, looking for all the files that match the given patterns. That information flows into the next activity.
-
Parse Source Code:For each file in the file list the parser collects SemiExps and passes them to each of its rules until a match occurs. The match results in calls into the AbstrSynTree package to add another node to the AST or add information to an existing node on the top of the Scope Stack.
-
Build AST & Record Lines:The loop between "Parse Source Code" and "Build AST & Record Lines" is traversed for each rule detection. This looping continues from the start of the first file analysis to the end of the last. If the rule is for the beginning of a scope, a new node is pushed onto the Scope Stack. If the rule matches an end of scope event then the node is popped off the stack. For all other rules information in the node at the top of the stack is modified.
-
Complexity Analysis:When we get to this activity all the AST nodes have been added to the tree. Complexity Analysis just walks the AST and for each namespace, class, struct, or function node, records the number of descendent scope nodes. That, of course, has to happen just before we pop back to a parent node during the tree walk, at which time we record the complexity information in the current node. When we leave this activity all of the analysis information has been stored in the tree.
-
Display Metrics:The Display package gathers Metric information by walking the AST, collecting node name, type, starting line, line count, and complexity. That is then formatted into a table and written out using the logger. Actions store data declarations encountered in some scope in the AST node for that scope. They keep track of whether the data has public, protected, or private access specified. That information is stored along with the data declaration in the AST node for that scope. This information is shown in the metrics display and also used in the last activity of the CodeAnalyzer before termination.
-
Display SLOCs:While building the AST actions store size information in a table with filename keys. That is enumerated by the Display package to provide this display.
-
Display AST:This display is generated by the Display package by walking the AST tree, extracting information, and displaying with an indentation that is proportional to the depth of the current node in the tree.
-
Display Metrics Summary:This activity is very similar to the Metrics display processing except that the only information displayed is that which exceeds limits for function size and complexity. Also shown are all of the public data defined anywhere in the AST.
-
-
-
Output:
You see, in the figure at the right, a typical output for VisualCodeAnalyzer execution. The user has selected both C++ and C# file analysis, browsed to the root folder of the CodeAnalyser Solution, selected metric display, and started the CodeAnalyzer. The command line paramters for the console application are shown at the top of the display, along with the date and time of execution. It is convenient to have the controls and output in separate windows. We can look at the console output, decide to change some execution parameters, do that in the GUI window while still observing the output, and run again. One other point - one way communication from the GUI application to the console is a lot eaiser to set up and manage than two way communcation between the executing code and GUI displays. Notice that public data is shown in the console window just below the class or struct that owns that data, and the display also localizes it to a particular package. In this application the public data are members of a couple of structs that are strictly private to the implementation. They are never returned from functions that another code author might have to deal with. I treat public data from classes, and any construct that other code has to use, as an error of design. I do not do that for data held in private structs.
Summary
Here are things I like about this design:
- The combination of GUI and Console Application. GUI for browsing and setting parameters combined with a console analysis application for execution of the analysis is simple, works well, and is very usable.
- The structure is fairly simple for processing as complex as code analysis. We've built something close to a compiler front end. It is of course simpler, because we need to recognize only a small part of the C++ and C# languages. It's also surprising how much code is common for the analysis of both C++ and C# code.
-
The individual parts are all recognizable by name and function and they
distribute the program's complexity fairly uniformily among themselves. -
Not much Need to Change:
The structure is such that other applications like dependency analysis will keep almost all the packages intact, only modifying a few, like Window, ActionsAndRules, and Display, and those modifications will be small. Almost all the other of the fifteen packages will not need to change.
Here are things I don't like about the design:
-
Processing is not concurrent.
Analysis of each file is independent of that for every other file. The only thing that is shared is use of the Astract Syntax tree and scope stack.- That means that we could make the expensive parsing part concurrent, provided that we let each analyzer thread have its own Abstract Syntax Tree and Scope Stack. We just run each file's analysis on a thread pool thread.
- That means that we have to build mechanics to merge the ASTs for each file, but that is close to trivial to accomplish.
- We would also have to construct a parser for each file because the parser is welded to its scanner data source and that could work only on a single file at a time. It turns out that is also easy to do. In fact some of my parser demos do just that.
-
It's incomplete.
I haven't gotten around to trying on Java code. I expect that since it works on C# it's very likely to need next to no changes for Java.
- The C++/CLI compiler tool chain does not have a Xaml processor so all of the WPF functionality needs to be implemented with code - no declarative implementation.
- You might thing we would use a FolderBrowserDialog for selecting directories. I did not for two reasons. The first is that the FolderBrowserDialog control doesn't work very well. It does not scroll down to the selected path when you open it. You have to manually scroll and that gets to be a pain. The second reason is that sometimes we want to select specific files to process, not all those matching a pattern. For that you need the OpenFileDialog.
- C++ surprisingly does not have a directory manipulation library, so I wrote one a couple of years ago and use that here.
- We build views programmatically because we can't do that declaratively with C++/CLI. See footnote 1.