MLiPS 6: Pipeline Configuration

prototype policy-based reconfigurable pipeline

Mike Corley

click header to toggle Site Explorer

Testing Pipeline Configure:

Figure 7. Shows a PowerShell script, FindTerms.ps1, illustrating how a pipeline operation is determined by stage policy. The script accepts a path, gets its file contents by name, and, depending on its extension, .png or all other, dynamically configures the text extraction stage processing to use either Apache:Tika, or AWS:TextExtract.

The bottom of the script applies each discovered file to a PowerShell pipeline defined by the script. Partial results are shown at the bottom left of Figure 7.

Conclusions:

This story has described the concept for a pipeline architecture for managing machine learning processes designed to extract metadata from, and classify, text documents in many different formats.

Its structure is based on the use of PowerShell stages that support policy-based dynamic configuration by substitution. Proof of concept examples have been presented. They hint at the large impact that an implementation of this concept could have for handling massive analysis tasks associated with recognition and classification of documents.