The application domain has many different frameworks, tools, methods, and classes of algorithms,
data handling requirements… - lots of variance.
How can a workflow be defined so it's flexible and adaptable to changing requirements?
Use interfaces and defer specific (concrete) instances of processing parts to a
configuration/policy layer which is managed externally.
Define processing work flows as a sequence of stages.
Each stage exposes a contextually different processing interface.
Achieve flexibility by managing change externally, aided with object factories.
MLiPS Pipeline:
Plugin select user defined processing -
each “stage” abstractly defines a step of the work flow by
exposing an interface - a contract for processing.
Dynamically rebind pipeline stages at runtime.
Each stage is built around a custom PowerShell cmdlet.
Pipeline: => [ Data extraction | cleaning | formatting | transformation | { prediction or classification }
No explicit changes to code!
No proliferation of scripts.
Maximizes reuse.
Use any framework/algorithm as a plugin.
Leverage different tools and frameworks to maximize synergy.
Note: PowerShell cmdlets can invoke Java applications like Tika and OpenNLP tools by wrapping
them in OS processes, and Java
applications can, in-turn, run PowerShell cmdlets. This "interoperability" makes
plausable the incorporation of external tools in a PowerShell based framework.