about
3/07/2022
DataFlow Structure

Design Bite - DataFlow Structure

pipelined line-counter

"Begin at the beginning, the King said gravely, and go on till you come to the end; then stop."
- Lewis Carroll, Alice in Wonderland

1.0 Introduction

This DesignBite sequence was inspired by BuildOn project TextFinder. As that project is designed and implemented, a number of design decisions are made, consciously or unconsciously. Each of these pages addresses one answer to questions about the fundamental structure design decisions. To make discussion pragmatic and concrete, we implement a program that evaluates the number of lines in text files. Processing is quite simple so it allows us to see how each structure alternative is implemented. We consider both package structure and logical structure, e.g., functions and structs used to order design and implementation. In this Dataflow Structure page, code is implemented in a set of packages Executive, Input, Compute, and Output and their structs. That provides all of the organization for processing.

2. Application Structure - Dataflow

This structure is modular with a data flow structure. It differs from the previous factored structure in that:
  • Output can now be shown to the user while processing continues. This is often a very big ergonomic advantage.
  • The Executive no longer owns all of the parts. Now, Input owns Compute and Compute owns Output.
  • Testing becomes more complicated because each of the non-Executive parts must provide a test mock for the part to which it sends output.
Figure 1. DataFlow Logical Structure

Data Flow Structure

Data flow structure is designed to provide continuing output to users while the application is running, e.g., not just at the end. For programs that process a lot of data and may continue running for a while, continuous display is much more satisfactory for the user, e.g., no questions like: is it still running? am I getting the output I want? did the program crash? Data flow structure changes ownership. Instead of the Executive owning everything, a pipeline is set up where each element of the pipeline owns the next element in the sequence.

Pros:

  1. Continuous output
  2. Data has fewer passes, e.g., doesn't need to go back to Executive

Cons:

  1. Harder to implement and test piece by piece
  2. Most data flow applications will need test mocks

DataFlow Code Repository
Executive::main.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Executive::main.rs // // - Executive creates and uses all lower level parts // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// /* Note: Executive only creates Input instance. The rest of the pipeline self installs, e.g., Input creates Compute, and Compute creates Output. */ use input::*; fn main() { let putln = || println!(); print!("\n -- DataFlowStructure::Executive --\n"); let mut lines = 0; let mut inp = Input::new(); let name = "./src/main.rs"; lines += inp.do_input(name); putln(); let name = "../Input/src/lib.rs"; lines += inp.do_input(name); let name = "../Input/examples/test1.rs"; lines += inp.do_input(name); putln(); let name = "../Compute/src/lib.rs"; lines += inp.do_input(name); let name = "../Compute/examples/test1.rs"; lines += inp.do_input(name); putln(); let name = "../Output/src/lib.rs"; lines += inp.do_input(name); let name = "../Output/examples/test1.rs"; lines += inp.do_input(name); putln(); print!("\n total lines: {}", lines); print!("\n\n That's all Folks!\n\n"); } Output -- DataFlowStructure::Executive -- file "./src/main.rs" has 48 lines of code file "../Input/src/lib.rs" has 51 lines of code file "../Input/examples/test1.rs" has 17 lines of code file "../Compute/src/lib.rs" has 56 lines of code file "../Compute/examples/test1.rs" has 29 lines of code file "../Output/src/lib.rs" has 26 lines of code file "../Output/examples/test1.rs" has 15 lines of code total lines: 242 That's all Folks! cargo.toml [package] name = "executive" version = "0.1.0" authors = ["James W. Fawcett"] edition = "2018" # See more keys ... [dependencies] input = { path = "../Input" } Input::lib.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Input::lib.rs // // - Attempts to return line count from file // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// /* Note: - Input owns and instantiates Compute. - It attempts to open file and pass to Compute for processing. - Returns line count if successful */ use compute::*; mod file_utilities; use file_utilities::{open_file_for_read}; #[derive(Debug)] pub struct Input { name: String, compute: Compute } impl Input { pub fn new() -> Input { Input { name: String::new(), compute: Compute::new() } } pub fn do_input(&mut self, name: &str) -> usize { let mut lines: usize = 0; self.name = name.to_string(); let rslt = open_file_for_read(name); if let Ok(file) = rslt { self.compute.do_compute(name, file); lines = self.compute.lines(); } else { print!("\n can't open file {:?}", name); } lines } } #[cfg(test)] mod tests { #[test] fn it_works() { assert_eq!(2 + 2, 4); } } file_utilities module ///////////////////////////////////////////////////////////// // DataFlowStructure::Input::file_utilities.rs // // - Input attempts to open named file and return File // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// /* This code may be useful for other programs so it is factored into a module here. */ #![allow(dead_code)] use std::fs::*; use std::io::{Read, Error, ErrorKind}; pub fn open_file_for_read(file_name:&str) ->Result<File, std::io::Error> { let rfile = OpenOptions::new() .read(true) .open(file_name); rfile } pub fn read_file_to_string(f:&mut File) -> Result<String, std::io::Error> { let mut contents = String::new(); let bytes_rslt = f.read_to_string(&mut contents); if bytes_rslt.is_ok() { Ok(contents) } else { Err(Error::new(ErrorKind::Other, "read error")) } } test1.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Input::test1.rs // // - Attempts to return line count from file // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// use input::*; fn main() { print!("\n -- input::test1 --\n"); let mut inp = Input::new(); let name = "./src/lib.rs"; let lines = inp.do_input(name); print!("\n received {} lines from compute", lines); print!("\n\n That's all Folks!\n\n"); } Test Output -- input::test1 -- file "./src/lib.rs" has 51 lines of code received 51 lines from compute That's all Folks! cargo.toml [package] name = "input" version = "0.1.0" authors = ["James W. Fawcett"] edition = "2018" # See more keys ... [dependencies] compute = { path = "../Compute" } Compute::lib.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Compute::lib.rs // // - Attempts to read opened file to string, count lines // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// /* Note: - creates instance of Output - attempts to read file to string and count its lines - sends results to Output */ use std::fs::*; use output::{Output}; mod file_utilities; use file_utilities::read_file_to_string; #[derive(Debug)] pub struct Compute { lines: usize, out: Output } impl Compute { pub fn new() -> Compute { Compute { lines: 0, out: Output::new() } } pub fn do_compute(&mut self, name: &str, mut file:File) { let rslt = read_file_to_string(&mut file); if let Ok(contents) = rslt { self.lines = 1; for ch in contents.chars() { if ch == '\n' { self.lines += 1; } } self.out.do_output(name, self.lines); } else { print!("\n could not read {:?}", name); } } pub fn lines(&self) -> usize { self.lines } } #[cfg(test)] mod tests { #[test] fn it_works() { assert_eq!(2 + 2, 4); } } Module file utilities Module copied from input/src test1.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Compute::test1.rs // // - Attempts to read opened file to string, count lines // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// use compute::*; use std::fs::*; use std::io::*; fn open_file_for_read(file_name:&str) -> Result<File> { let rfile = OpenOptions::new() .read(true) .open(file_name); rfile } fn main() { print!("\n -- compute::test1 --\n"); let name = "./src/lib.rs"; let rslt = open_file_for_read(name); if let Ok(file) = rslt { let mut compute = Compute::new(); let _ = compute.do_compute(name, file); } print!("\n\n That's all Folks!\n\n"); } Output -- compute::test1 -- file "./src/lib.rs" has 56 lines of code That's all Folks! cargo.toml [package] name = "compute" version = "0.1.0" authors = ["James W. Fawcett"] edition = "2018" # See more keys ... [dependencies] output = { path = "../Output" } Output::lib.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Output::lib.rs // // - Sends results to console // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// #[derive(Debug)] pub struct Output { } impl Output { pub fn new() -> Output { Output {} } pub fn do_output(&self, name: &str, lines: usize) { print!( "\n file {:?} has {} lines of code", name, lines ); } } #[cfg(test)] mod tests { #[test] fn it_works() { assert_eq!(2 + 2, 4); } } test1.rs ///////////////////////////////////////////////////////////// // DataFlowStructure::Output::test1.rs // // - Sends results to console // // Jim Fawcett, https://JimFawcett.github.io, 04 Mar 2021 // ///////////////////////////////////////////////////////////// use output::*; fn main() { print!("\n -- test Output --\n"); let out = Output::new(); out.do_output("SomeFile.rs", 3); print!("\n That's all Folks!\n\n"); } Test Output -- test Output -- file "SomeFile.rs" has 3 lines of code That's all Folks! cargo.toml [package] name = "output" version = "0.1.0" authors = ["James W. Fawcett"] edition = "2018" # See more keys ... [dependencies]

3. Epilogue

The fourh design alternatives considered here:
  1. Monolithic Structure
  2. Factored Structure
  3. DataFlow Structure
  4. TypeErase Structure
  5. PlugIn Structure
are progressively more flexible, eventually resulting in reusable components, but also increasingly complex. Where you settle in these alternatives is determined by design context. Is this a one-of-a-kind project that you want to finish quickly or is it heading for production code that will be maintained by more than one developer?
  Next Prev Pages Sections About Keys