The primary concept for the ∞DL model is that data objects generated in a research laboratory represent
unique events of data collection, traceable to a human data producer, the equipment used, the research
subject(s) (e.g., animals; plants, soil samples), date/time, place, and the context of data production
in the form of a global research process.
Architecture:
Data objects are divided into Raw Data, Derived Data, and Report objects. Raw Data objects are treated
with special deference as representations of unique events in a research process, and are first-class
citizens in the ∞DL universe.
Data files in formats “foreign” to ∞DL need first to be converted to the ∞DL data format using an API
provided by ∞DL. Converted data objects are accepted into ∞DL following validation which includes a
the presence of mandatory attributes specifying context of data object production, and best-effort check
on data meaningfulness.
Once accepted by ∞DL, data objects are “sealed” with an output of a hash function (e.g., SHA256 or SHA512)
computed on essential features of the data object and serving as a world-wide unique data ID.
Data object integrity can be checked at any time by comparing the ID with the hash function value
computed by ∞DL for validation. Permanent deletion of ∞DL objects is out of the control of an end user and
is performed by the system according to established policies. The trace of deleted data object remains
in the system indefinitely.
Geospatially, or in the context of the global research process, ∞DL system is a two-tiered federation.
Any individual or research group can create a ∞DL Data Producing Node (DPN) using ∞DL Client software.
An arbitrary number of DPNs can be initiated and constitute the first tier. ∞DL Authority constitutes
the second tier and provides mapping services to dispatch globally-generated data requests to individual
DPNs.
In order to facilitate global visibility and access to data, DPN nodes have to register with a ∞DL
Authority, but this is not necessary for day-to-day data archiving and access operations within a DPN.
Implementation:
∞DL Client software uses established Cloud storage services, e.g., Google Drive or Amazon AWS, mostly as
lightweight Cloud file servers. Business logic for data submission and access is realized by
∞DL Client software. The recommended configuration maintains a fully redundant copy of all data locally
(storage is cheap), which enables a ∞DL Client to work autonomously for any amount of time.
Data access requests are managed by ∞DL Client software and producw Views of the data objects.
The Views enable users to extract subranges of viewed data and to export them to “foreign” formats.
∞DL provides file reader and writer APIs which can be used for processing and analyzing data directly
from native ∞DL data files, without exporting to “foreign” formats.
DPN nodes maintain a DPN Ledger which registers all events of data object submission, replacement,
moving, or “trashing” (which is decoupled from actual permanent deletion), as a sequence of
blockchained records. A DPN Ledger can only grow with time, making corruption more difficult with
addition of every record. DPN Ledger, together with data object sealing and other DPN features,
supports integrity of the entire history of DPN evolution.
Modification of an object is easily discoverable. Deliberate corruption of the entire DPN
is virtually impossible or will require a level of user sophistication incompatible with perceived
rewards of such fraud.