Blog: Graph

directed graph, vector arena

About

click to toggle Site Explorer

Initial Thoughts:

A graph, of the kind discussed here, is a collection of vertices connected by edges. We'll be concerned exclusively with directed graphs where each edge has a direction from its parent vertex to some child vertex in the collection. Directed graphs are particularly useful for describing large sets of relationships. Here are some example graph models:

Build dependency relationships between software packages
Calling relationships between functions in a program
Relationships between types in a system, e.g., inheritance, composition, aggregation, and using
Communication connections, e.g., web hosts and client machines

Frequently graph models are very large, as in models of the internet developed by "The Cooperative Association for Internet Data Analysis" (CAIDA). Therefore, we will be particularly interested in means to persist models to and from disk storage, to make storage parsimonius, and design operations to be as efficient as is reasonably practical.

Figure 1. - Graph Adjacency List

Figure 2. - Graph Structure

Figure 3. - Graph Classes

A C++ Template-based Graph Facility

Frequently graph structures are discussed in terms of adjacency lists like the example in the first diagram. Note that the adjacency list holds redundant nodes to represent graph vertices, e.g., there are three nodes that represent vertex 4 in the diagram.

We won't want to directly implement this structure, not only because of the unnecessary storage, but also because that opens up the possiblity of two or more representations of a node having different properties and values. When there are storage redundancies errors like that are relatively easy to make and often very difficult to find.

The linked structure shown in the next diagram is preferable. It has no redundancies - each vertex holds references to child vertices, where both parent and children reside in the same collection. These references could be implemented with pointers, but we will choose not to do that, and use indexes into the adjacency collection to refer to child vertices. That makes creation of copies trivial. In fact, the compiler created member-wise copy constructor and assignment operator will be correct for that design.

Design:

The Graph's logical structure is shown in the third diagram. The graph class is template-based with parameters V, a vertex type, and E, an edge type.

These could simply be strings that label vertices and edges. In this case a vertex might represent a town on a map with edges that are roads from that town to another.

Often, however, we may need a vertex or an edge to hold composite information containing several different values. In this case we simply define a V class and/or an E class with the appropriate structure to hold that information.

The graph holds a collection of vertices in a std::vector<Vertex<V,E>>, named adj for adjacency list. Each vertex holds an instance of the V type and a collection of edges in a std::vector<Edge>. An edge is simply a typedef for std::pair<size_t, E>. The size_t argument is an index into the graph's vertex collection, selecting the edge's target vertex. The E argument is an instance of the E type, so each edge can hold a unique instance.

Simple XmlReader and XmlWriter classes are part of the graphLib archive and are used to create and retrieve graph models from disk storage. These are not Document Object Model (DOM) based, but simply implemented with string manipulation and a scope stack. You may find these usesful for other applicatations as well.

Source Code:

This graph implementation is written in C++ using Visual Studio 2013. It should port, with little difficulty, to Gnu gcc to run on Linux, for example.

Directed Graph Library

Contributions

One of my doctoral advisees, Mehmet Kaya developed strong component and topological softing algorithms for this graph class. He has been using that to help develop new ways of restructuring C++ source code to improve its structure and maintainability. Here is a Syracuse University Technical Report describing that work. Another advisee, Mubarek Mohammed, is using the graph class to support research on visualization for large software systems.

Conclusions:

This graph class implementation is reasonably simple, fast, and robust. It supports persisting graph models to XML data files and retrieving back into graph instances. We have an interest in algorithms, based on this graph facility, that support investigations into the structure and dynamics of large systems. We will add algorithms we are working with to this package as they mature.