about
12/01/2022
Duplicates Repo
Duplicates code

Duplicates  Repository

Interesting composite STL data structure for storing duplicate file information

Quick Status Code functions correctly no known defects Demonstration code yes Documentation yes Test cases no Static library no Build requires C++17 option Planned design changes Update with new CodeUtilities
Fig 1. Duplicates Class DataStore
Fig 2. Duplicates Class Diagram
Fig 3. Duplicates Output

1.0 Concept

Find locations of all files with names that are the same as one or more files in other directories. Do that using each file name and each path only once. That saves space and CPU processing.

2.0 Design

Duplicates accepts a path defining the root of a directory tree to search. It looks for two or more instances of files with the same name and matching one or more patterns, but does not attempt to determine if they have the same contents.
You can check that with the Diff_WPF tool, also found in the Tools category.
Duplicates uses a DataStore class holding a std::unordered_map to store results. The map's key is a file name, and its value is a list of iterators pointing into a set of paths. That means that file names and paths are not repeated in storage, saving both space and processing time.
Every file encountered is entered into the map, along with an iterator pointing to its path. When processing is complete, all the files referencing two or more paths are reported as duplicates.
Note, from the Data Store Structure image, that a relatively sophisticated data structure is assembled with just a few using statements. That illustrates the power of the STL-Containers design. It allows developers to craft whatever storage mechanisms they need by simply "snapping together" a few parts.

3.0 Build

Duplicates builds with Visual Studio Community Edition - 2019, and was tested on Windows 10.

4.0 Status

Note that the command line argument syntax, as shown in the Duplicates Output image, here, is different than that used in many of the other tools. That is because Duplicates is older than many of the others. Eventually, it will be retro-fitted with the CppUtilities ProcessCmdLine and will then use the expected syntax.
  Next Prev Pages Sections About Keys