N P

Software Testing

Initial Thoughts:

There are several types of software tests each with its own specific objectives:

Construction Tests - run by Developers:
As we build a software package we lay out its structure with declarations of one or more classes. These are implemented by adding a few lines of code in a class method, adding a test for that method in a construction test stub¹, then building, and executing the new test. Doing this continuously allows us to keep the software always running - at first just doing simple things, but progressively adding more detail, until the class meets its complete set of obligations. We do that for each class until our package is complete. Then we repeat with the next package and so on until the project has been completed.
Unit Tests - run by Developers:
It is essential that we make packages, on which other packages depend, as free of errors as is practical. Any errors in a low-level package imply that the operation of most packages that depend on the flawed package will also be flawed. A unit test is an eyes-on, debugger walk through that exercises every line of code, function-by-function, toggles every predicate, and examines the results carefully to ensure correct operation. These are expensive tests and we often choose not to unit test high level packages, especially Graphical User Interfaces² because their operations should be simple and clear.
Integration Tests - run by Developers and QA:
Integration tests are automated tests that exercise changed or newly introduced code in the context of the entire project baseline³, or some large subset of that. When we create a new package or modify an existing package the changes we make may break, not only packages that directly depend on the new version, but also on other packages that are not obviously connected. For example, a package that minipulates data in a database may introduce changes to the database that break other code that depend on the database state. This can happen even if the new version has no compilation dependency relationships with the other package. Integration tests are aggregated over the course of developement of a large project and are run frequently to ensure that new code doesn't break the existing baseline. Each integration test driver (there will be many of them) has, embedded in comments of its source code, a test description that defines what the test is trying to demonstrate and the steps it uses to do that.
Regression Tests - run by QA:
Regression tests are very similar to Integration tests. The only difference is their purpose. We use regression testing when porting an application to a new platform or modified platform and when we deploy the application to another site. Here we run a sequence of tests on the entire baseline to show that it still functions in the same way it did in its original environment.
Performance and Stress Testing - run by SW Architect, and technical Developers:
Performance tests are run to see if the system is meeting its allocated time and CPU resource budgets. Stress tests are designed to exceed the required input load capacity and demand more of the platform's resources than expected. The purpose of these tests is to insure that the system survives unusual loads and resource constraints, continuing to operate, although it may not satisfy all its obligations in those extreme environments.
Qualification Tests - run by QA under supervision of a Program Manager:
The final demonstration of a large software system, called a Qualification Test, is used to demonstrate that the system meets all of its obligations to the customer as laid out in a system specification. This qualification testing is a legalistic, not technical, test. It should be very organized, detailed, and boring⁴. Qualification testing uses both test automation, like regression tests, but also uses hands-on probing and demonstrating to the customer the system's operational details.

Implementing Tests:

Construction tests are implemented manually as you work on the initial construction of a system. Every package has a main function that is included or excluded from compilation by a compiler directive. We refer to this main function as a test stub. During construction testing for each package we:
- Layout class declarations that are initially empty. This allows us to think about the functionality required and the language we want to provide for higher level packages to use.
- Begin populating classes with methods, one-at-a-time. Each method starts with a few lines of code.
- For each new small addition of code we may add a test in the test stub, and build, and execute. When there are flaws in our implementation, we know were to look - it is very likely to be in the code we just added.
- Repeat this process until the package has become fully functional.
- Repeat all of the steps, above, for each package in the project.
Unit tests walk through every line of code with a debugger ensuring that every branch has been activated by test inputs. We do this function-by-function for each class in the package. This means that we will need to:
- Do some initial planning for test execution.
- Build a test driver that supplies inputs and may log package state and results.
- The driver provides inputs that drive the code into each of its possible states⁵.
- For complex functions we may need to accept data stored in files or a database and record results in files or database state.
- Each test driver records, in comments, the resources it uses⁶, e.g., files and databases.
- When unit testing of a package is complete, the test driver is saved in a repository for future use. For critical packages we may repeat unit tests for every new version.
There are also frameworks like NUnit and JUnit that support testing .Net and Java code, respectively. Developers call these Unit test frameworks, but the testing they support is a combination of parts of unit testing and integration testing.
Integration tests are a fundamental part of continuous software integration and are used by each developer and Quality Assurance (QA) to test code against the current project baseline or some significant part of the baseline. These are automated tests for which we need:
- A Test Harness that loads and executes tests on demand.
- For each test a dynamic link library (DLL)⁷ will be loaded by the Test Harness that contains a test driver and the code to be tested⁸.
- The test driver derives from an ITest interface and supplies a factory class with a static method to create instances of all the classes used by the test.
- A source of test inputs that may be supplied by the developer or, where feasible, by the test harness.
- A mechanism for logging test results and possibly internal state values. These logs are named, identify the test developer, the code tested including version, and are time-date stamped for each test execution.
- A code Repository that holds the current project baseline and can deliver, on demand, DLLs for all the code on which the tested code depends or may affect. Note that implies the repository has the capability to evaluate calling dependency relationships.
- Test requests consist of a message, perhaps in XML format, that lists a series of one or more tests to execute. The Test Harness will execute each request on its own thread, probably using a thread pool, and executes each test in the request sequentially. A developer is likely to submit test requests with only a few tests while QA will submit test requests with many tests, perhaps all of the tests that have been defined for a specific subsystem.
- Test drivers and test resources are stored in the Repository, as they will be used many times, some perhaps thousands of times over the lifetime of a project. Essentially test drivers are part of the software baseline.
Ideally the Repository and Test Harness are designed to work together. Existing configuration management systems can be made to work as repositories in the sense we discuss here, but it is hard to get them to deliver just what is needed for a given test. More likely they clone massive parts of the baseline and give us many more parts to deal with than needed by each test. That happens because they don't make dependency graphs available. They just provide clones of named parts. Therefore, the developer has to figure out what is needed, or, more likely, will just clone big globs of code that hopefully contain all of the depended upon code. Note that a developer certainly knows the packages on which his code depends directly, but is unlikely to know all of the dependency decendents of those packages.
For the Repository, I prefer a dedicated server that stores each version of each package only once, has metadata for each package that describes its dependency relationships - thus forming a virtual dependency graph for each package - and contains useful descriptive information in the metadata to support browsing and analysis. It supplies, on demand, the entire dependency graph of a named component⁹.
The Repository will need to provide build facilities for creating DLLs needed for testing. These will almost certainly be cached in the Repository baseline. It will also need to support versioning information and ownership policies to determine who is allowed to checkin changes to any package.
Here's a link that discusses using git for large builds and suggests using Maven to manage project dependencies - Maven was designed to support Java development: Code Dependencies - pain point with classic config mgrs like git
There are dependency managers for .Net, none of which I've used yet. Here's some links:
Paket - package manager with dependency resolution
nuget - package manager - dependency resolution?
Using nuget with repositories
Regression Tests are really just the same tests used for integration, but are used for other purposes, as described in the Initial Thoughts section. Their implementation will not differ from integration tests in any significant way.
Performance Tests are specialized tests that use the integration test framework with drivers that are designed specifically for timing and throughput analysis. Instead of running many tests on different parts of the baseline they run many tests repeatedly on the same code, collecting execution times for each and then building outputs that provide statistics on execution times and throughput.
Stress tests are specialized tests that work much like performance tests. For performance testing we provide inputs and an environment that is specified for the application. For stress testing we provide inputs that exceed the specified load and modify environments in ways that the application was not expected to tolerate. We look critically for any error states from which the application can not recover while still running. We also look for corruption of persistant state in databases or files that will prevent the system from successfully restarting into a fully functional state.
Qualification Tests are very thorough examinations of the operations of a system to ensure that it meets all of its specified obligations. A Test Harness and Repository can be excellent resources for Qualification tests.
The extent of Qualification testing is very much dependent on the size of each project and the context in which the system is built, e.g., large projects funded by government agencies vs. internal industrial projects carried out to extend existing products or to implement a new product.

You may wish to look at CSE681 - SMA: Project #5 to see how these parts, Test Harness and Repository fit into a Software Development Environment.

Your use of Testing in Courses:

In the courses you will take in your program of study, as you work on class projects, you should always use construction testing, starting with a simple system shell that doesn't do much but does compile and run. You add functionality, package-by-package, in small pieces, adding construction tests with each small addition of functionality. This way you keep the system always running and progressively acquire more functionality until you are finished or the Project due date has arrived.

We discuss test frameworks in CSE681 - Software Modeling and Analysis (SMA), and also briefly in CSE687 - Object Oriented Design (OOD). For example, in the Fall of 2016 we are developing, in CSE681 - Software Modeling and Analysis, a test harness that could be used in a collaboration system for software development.

You probably will not engage in Regression, Performance, Stress, or Qualification testing as part of your program of study, but may acquire that as on the job training once you are working professionally. We used to do some of that in CSE784 - Software Studio, but unfortunately we are no longer able to offer that course.

Each package contains a main function enclosed with compiler directives that allow us to compile for stand-along operation using the main, or for integration into a larger body of code, without the main. We call this main function a test stub.
Graphical User Interfaces should focus on accepting program data from the user and displaying results. It should do no other computation. Any computation it needs should be delegated to another package that can be tested effectively in a test framework.
The project baseline is all the code associated with the project's current state. It includes only code that has been successfully tested and checked into the project's Repository. That includes both product code and test drivers.
If a Qualification test is exciting, we have problems. Qualification should proceed in an orderly fashion, demonstrating to the customer(s) that the system works as specified. We don't want any unexpected behavior or lack of expected behavior.
If a function has high complexity, it may be untestable. That is, it may be far easier to throw away the function and rebuild using a simpler structure than to test all the permutations of its many states.
These comments form a test description for the tested code. It would be relatively easy to build an analysis tool that extracts these "descriptions" for all of the test drivers and build a test document in real time - always up-to-date and as accurate as the comments in the driver code.
A dynamic link library (DLL) is a compiled binary that is constructed to load into an executable image at run-time. Unix and Linux developers refer to these as "shared" libraries.
It would be equally effective to build separate DLLs for the test driver and tested code, where the test driver loads the tested code DLL when it is loaded.
The Repository recursively scans metadata, starting at the node for the requested component, and recursing into all its descendents, capturing a list of all the filespecs it encounters. This list is what the Test Harness uses to get the DLLs it needs for any given test. Obviously it would be desirable to cache the downloaded DLLs so we don't keep sending some of the same components test-after-test.

Newhouse