Design Bite - DataFlow Structure
pipelined line-counter
"Begin at the beginning, the King said gravely, and go on till you come to the end;
then stop."
- Lewis Carroll, Alice in Wonderland
1.0 Introduction
This DesignBite sequence was inspired by BuildOn project TextFinder.
As that project is designed and implemented, a number of design decisions are made, consciously or unconsciously.
Each of these pages addresses one answer to questions about the fundamental structure design decisions.
To make discussion pragmatic and concrete, we implement a program that evaluates the number of lines
in text files. Processing is quite simple so it allows us to see how each
structure alternative is implemented.
We consider both package structure and logical structure, e.g., functions and structs used to order
design and implementation. In this Dataflow Structure page, code is implemented in a set of packages
Executive, Input, Compute, and Output and their
structs. That provides all of the organization for processing.
2. Application Structure - Dataflow
This structure is modular with a data flow structure. It differs from the previous factored structure
in that:
-
Output can now be shown to the user while processing continues. This is often a very big
ergonomic advantage.
-
The Executive no longer owns all of the parts. Now, Input owns Compute and Compute owns Output.
-
Testing becomes more complicated because each of the non-Executive parts must provide a test mock
for the part to which it sends output.
Figure 1. DataFlow Logical Structure
Data Flow Structure
Data flow structure is designed to provide continuing output to users while the application
is running, e.g., not just at the end. For programs that process a lot of data and may
continue running for a while, continuous display is much more satisfactory for the user, e.g.,
no questions like: is it still running? am I getting the output I want? did the program crash?
Data flow structure changes ownership. Instead of the Executive owning everything, a pipeline
is set up where each element of the pipeline owns the next element in the sequence.
Pros:
- Continuous output
- Data has fewer passes, e.g., doesn't need to go back to Executive
Cons:
- Harder to implement and test piece by piece
- Most data flow applications will need test mocks
DataFlow Code Repository
Executive::main.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Executive::main.rs //
// - Executive creates and uses all lower level parts //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
/*
Note:
Executive only creates Input instance. The rest of
the pipeline self installs, e.g., Input creates Compute,
and Compute creates Output.
*/
use input::*;
fn main() {
let putln = || println!();
print!("\n -- DataFlowStructure::Executive --\n");
let mut lines = 0;
let mut inp = Input::new();
let name = "./src/main.rs";
lines += inp.do_input(name);
putln();
let name = "../Input/src/lib.rs";
lines += inp.do_input(name);
let name = "../Input/examples/test1.rs";
lines += inp.do_input(name);
putln();
let name = "../Compute/src/lib.rs";
lines += inp.do_input(name);
let name = "../Compute/examples/test1.rs";
lines += inp.do_input(name);
putln();
let name = "../Output/src/lib.rs";
lines += inp.do_input(name);
let name = "../Output/examples/test1.rs";
lines += inp.do_input(name);
putln();
print!("\n total lines: {}", lines);
print!("\n\n That's all Folks!\n\n");
}
Output
-- DataFlowStructure::Executive --
file "./src/main.rs" has 48 lines of code
file "../Input/src/lib.rs" has 51 lines of code
file "../Input/examples/test1.rs" has 17 lines of code
file "../Compute/src/lib.rs" has 56 lines of code
file "../Compute/examples/test1.rs" has 29 lines of code
file "../Output/src/lib.rs" has 26 lines of code
file "../Output/examples/test1.rs" has 15 lines of code
total lines: 242
That's all Folks!
cargo.toml
[package]
name = "executive"
version = "0.1.0"
authors = ["James W. Fawcett"]
edition = "2018"
# See more keys ...
[dependencies]
input = { path = "../Input" }
Input::lib.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Input::lib.rs //
// - Attempts to return line count from file //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
/*
Note:
- Input owns and instantiates Compute.
- It attempts to open file and pass to Compute for
processing.
- Returns line count if successful
*/
use compute::*;
mod file_utilities;
use file_utilities::{open_file_for_read};
#[derive(Debug)]
pub struct Input {
name: String,
compute: Compute
}
impl Input {
pub fn new() -> Input {
Input {
name: String::new(),
compute: Compute::new()
}
}
pub fn do_input(&mut self, name: &str) -> usize {
let mut lines: usize = 0;
self.name = name.to_string();
let rslt = open_file_for_read(name);
if let Ok(file) = rslt {
self.compute.do_compute(name, file);
lines = self.compute.lines();
}
else {
print!("\n can't open file {:?}", name);
}
lines
}
}
#[cfg(test)]
mod tests {
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
}
file_utilities module
/////////////////////////////////////////////////////////////
// DataFlowStructure::Input::file_utilities.rs //
// - Input attempts to open named file and return File //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
/*
This code may be useful for other programs so it is
factored into a module here.
*/
#![allow(dead_code)]
use std::fs::*;
use std::io::{Read, Error, ErrorKind};
pub fn open_file_for_read(file_name:&str)
->Result<File, std::io::Error> {
let rfile = OpenOptions::new()
.read(true)
.open(file_name);
rfile
}
pub fn read_file_to_string(f:&mut File)
-> Result<String, std::io::Error> {
let mut contents = String::new();
let bytes_rslt = f.read_to_string(&mut contents);
if bytes_rslt.is_ok() {
Ok(contents)
}
else {
Err(Error::new(ErrorKind::Other, "read error"))
}
}
test1.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Input::test1.rs //
// - Attempts to return line count from file //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
use input::*;
fn main() {
print!("\n -- input::test1 --\n");
let mut inp = Input::new();
let name = "./src/lib.rs";
let lines = inp.do_input(name);
print!("\n received {} lines from compute", lines);
print!("\n\n That's all Folks!\n\n");
}
Test Output
-- input::test1 --
file "./src/lib.rs" has 51 lines of code
received 51 lines from compute
That's all Folks!
cargo.toml
[package]
name = "input"
version = "0.1.0"
authors = ["James W. Fawcett"]
edition = "2018"
# See more keys ...
[dependencies]
compute = { path = "../Compute" }
Compute::lib.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Compute::lib.rs //
// - Attempts to read opened file to string, count lines //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
/*
Note:
- creates instance of Output
- attempts to read file to string and count its lines
- sends results to Output
*/
use std::fs::*;
use output::{Output};
mod file_utilities;
use file_utilities::read_file_to_string;
#[derive(Debug)]
pub struct Compute {
lines: usize,
out: Output
}
impl Compute {
pub fn new() -> Compute {
Compute {
lines: 0,
out: Output::new()
}
}
pub fn do_compute(&mut self, name: &str, mut file:File) {
let rslt = read_file_to_string(&mut file);
if let Ok(contents) = rslt {
self.lines = 1;
for ch in contents.chars() {
if ch == '\n' {
self.lines += 1;
}
}
self.out.do_output(name, self.lines);
}
else {
print!("\n could not read {:?}", name);
}
}
pub fn lines(&self) -> usize {
self.lines
}
}
#[cfg(test)]
mod tests {
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
}
Module file utilities
Module copied from input/src
test1.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Compute::test1.rs //
// - Attempts to read opened file to string, count lines //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
use compute::*;
use std::fs::*;
use std::io::*;
fn open_file_for_read(file_name:&str)
-> Result<File> {
let rfile = OpenOptions::new()
.read(true)
.open(file_name);
rfile
}
fn main() {
print!("\n -- compute::test1 --\n");
let name = "./src/lib.rs";
let rslt = open_file_for_read(name);
if let Ok(file) = rslt {
let mut compute = Compute::new();
let _ = compute.do_compute(name, file);
}
print!("\n\n That's all Folks!\n\n");
}
Output
-- compute::test1 --
file "./src/lib.rs" has 56 lines of code
That's all Folks!
cargo.toml
[package]
name = "compute"
version = "0.1.0"
authors = ["James W. Fawcett"]
edition = "2018"
# See more keys ...
[dependencies]
output = { path = "../Output" }
Output::lib.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Output::lib.rs //
// - Sends results to console //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
#[derive(Debug)]
pub struct Output {
}
impl Output {
pub fn new() -> Output {
Output {}
}
pub fn do_output(&self, name: &str, lines: usize) {
print!(
"\n file {:?} has {} lines of code", name, lines
);
}
}
#[cfg(test)]
mod tests {
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
}
test1.rs
/////////////////////////////////////////////////////////////
// DataFlowStructure::Output::test1.rs //
// - Sends results to console //
// Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 //
/////////////////////////////////////////////////////////////
use output::*;
fn main() {
print!("\n -- test Output --\n");
let out = Output::new();
out.do_output("SomeFile.rs", 3);
print!("\n That's all Folks!\n\n");
}
Test Output
-- test Output --
file "SomeFile.rs" has 3 lines of code
That's all Folks!
cargo.toml
[package]
name = "output"
version = "0.1.0"
authors = ["James W. Fawcett"]
edition = "2018"
# See more keys ...
[dependencies]
3. Epilogue
The fourh design alternatives considered here:
- Monolithic Structure
- Factored Structure
- DataFlow Structure
- TypeErase Structure
- PlugIn Structure
are progressively more flexible, eventually resulting in reusable components, but also increasingly
complex. Where you settle in these alternatives is determined by design context. Is this a
one-of-a-kind project that you want to finish quickly or is it
heading for production code that will be maintained by more than one developer?