Fig 1. Duplicates Class DataStore
Fig 2. Duplicates Class Diagram
1.0 Concept
Find locations of all files with names that are the same as one or more files in other directories.
Do that using each file name and each path only once. That saves space and CPU processing.
2.0 Design
Duplicates accepts a path defining the root of a directory tree to search. It looks for two or more
instances of files with the same name and matching one or more patterns, but does not attempt to
determine if they have the same contents.
You can check that with the Diff_WPF tool, also found in the Tools category.
Duplicates uses a DataStore class holding a std::unordered_map to store results. The map's key is a file name, and its
value is a list of iterators pointing into a set of paths. That means that file names and paths are
not repeated in storage, saving both space and processing time.
Every file encountered is entered into the map, along with an iterator pointing to its path. When
processing is complete, all the files referencing two or more paths are reported as duplicates.
Note, from the Data Store Structure image, that a relatively sophisticated data structure is assembled
with just a few using statements. That illustrates the power of the STL-Containers design. It allows
developers to craft whatever storage mechanisms they need by simply "snapping together" a few
parts.
3.0 Build
Duplicates builds with Visual Studio Community Edition - 2019, and was tested on Windows 10.
4.0 Status
Note that the command line argument syntax, as shown in the Duplicates Output image, here, is different than that used in many of the other tools.
That is because Duplicates is older than many of the others.
Eventually, it will be retro-fitted with the CppUtilities ProcessCmdLine and will then use
the expected syntax.