about
MLiPS 1
11/21/2024

MLiPS 1: Forward

(M)achine (L)earning (i)ntegrated with (P)ower(S)hell pipelines

About Corley About Fawcett
click header to toggle Site Explorer

Forward:

Mike Corley - MLiPS Author A colleague and former student, Mike Corley, works as a software developer and consultant in information technology. He is interested in building a framework for managing machine learning processes for classification of large sets of documents. Typically these processes include:
  • Collection of sources of data.
  • Extracting objects from the sources for analysis and classification by one of several parsing applications.
  • Applying one or more of a very large collection of analysis/classification algorithms.
  • Converting the results into information for humans, e.g., extracting meta-data, removing redundant or inappropriate data items, forming summaries, gathering relationships between data items.
  • Projecting results through context dependent viewers: analysis of the process, information stream for users, results summaries.
  • Forwarding to long-term storage.
  • Saving a unique record of the steps above to enable replication.
The management of these processes has traditionally been effected using Bash Shell text-based scripts or with custom Python programs. The results are often a proliferation of scripts and difficulty recreating or cloning a past process.

Concept:

Mike's goal is to build a framework for managing set-up and execution of specific instances of this process using PowerShell cmdlets linked in a pipeline architecture. His intent is to create a single fixed framework with plug-in components to automate the process:
  • Support each of the process steps listed above by extracting plug-in components from a gallery, or creating a plug-in that, after proving its use in the current process, is registered with the gallery.
  • Use interfaces and object factories for each processing stage to isolate framework mechanics from application specific processing. That enables substitution of plug-ins. So each plug-in is required to implement the interface and object factory for its processing stage: collection, parsing, cleaning, analyzing, viewing, ...
Concept for Implementation:
  • Since the pipeline traffics in objects, PowerShell cmdlets are an effective mechanism for implementing pipeline stages.
  • A PowerShell cmdlet is an instance of a .Net class that derives from System.Management.Automation.Cmdlet or its sister PSCmdlet.
  • As a .Net class instance its methods send and receive objects, not just text streams. Its base Cmdlet class provides hooks to integrate it with the PowerShell execution enviroment.

MLiPS Story

Machine Learning integrated with PowerShell is about an interesting idea, developed by Mike Corley, for managing a complex process with many parts that change from application to application. This story summarizes the concept, and may, eventually, be extended to describe an implementation.
  Next Prev Pages Sections About Keys