Figure 7. Shows a PowerShell script, FindTerms.ps1, illustrating how a pipeline operation is
determined by stage policy. The script accepts a path, gets its file contents by name, and,
depending on its extension, .png or all other,
dynamically configures the text extraction stage processing to use either Apache:Tika, or
AWS:TextExtract.
The bottom of the script applies each discovered file to a PowerShell pipeline defined by the script.
Partial results are shown at the bottom left of Figure 7.
Conclusions:
This story has described the concept for a pipeline architecture for managing machine learning
processes designed to extract metadata from, and classify, text documents in many different formats.
Its structure is based on the use of PowerShell stages that support policy-based dynamic configuration
by substitution. Proof of concept examples have been presented. They hint at the large impact that
an implementation of this concept could have for handling massive analysis tasks associated with
recognition and classification of documents.
bottomtop