∞DL Data Objects in Context of the Global Research Process
The global realization of ∞DL starts with creation of individual Data Producing Nodes (DPNs),
driven by individuals and small groups. The DPNs are expected to make best effort to properly
represent the organizational structure of the data producing site.
Despite independence of individual DPN creator(s), submitted data objects must be attached to
the global ∞DL Tree which, apart from data objects’ unique IDs, are supposed to create a
unique path to a data object in the ∞DL universe.
The ∞DL Tree is a purely abstract concept unrelated to any geospatial or Internet locations.
The ∞DL Tree starts with a global root ∞DL, which is nothing more than a pledge of allegiance
to the concept. DPN nodes are expected to provide a valid DPN path starting from the global root
and faithfully reflecting the administrative structure. A realistic example is
“∞DL/edu.utah/SOM/CVRTI/Zaitsev lab”
From the DPN root “/Zaitsev lab” a subtree will grow, which will include Project Nodes and Data Nodes.
A complete path to a DataObject could be like this:
“∞DL/utah.edu/SOM/CVRTI/Zaitsev lab/RVOT conduction/Stress kinases and RVOT vulnerability
/TTX+KN93/exp2018_11_21A/Raw Data”
Further, ∞DL supports syntax for addressing and accessing subranges of DataItems.
Within a DataObject, each DataItem is addressed using #name suffix, and a subrange of data within
a DataItem is addressed using @rangeString suffix.
Extending the example above, assume the DataObject named “Raw Data” has a DataItem called “movie 003”
which is a 3D array with dimensions of X, Y, and frameNo. Then the full ∞DL Tree path to frame 5
in “move 003” will be as follows:
“∞DL/utah.edu/SOM/CVRTI/Zaitsev lab/RVOT conduction/Stress kinases and RVOT vulnerability
/TTX+KN93/exp2018_11_21A/Raw Data#movie 003@*,*,5”
With this federation approach for independent DPN creation, two types of name conflicts or ambiguities
may arise. A request to access a Project Node may be ambiguous due to multiple paths leading to the
same Project Node. For example, the path“∞DL/edu.utah/SOM/CVRTI” probably refers to the same entity
as the path“∞DL/edu.utah/School of Medicine/CVRTIindent”.
This situation could arise, for example,
if two different labs at the CVRTI started independent DPNs without good agreement on the names
of the administrative units. This is mostly an administrative problem, however, which can be
remedied by ∞DL Authority storing lists of equivalent names for various ORG and DIV Nodes, and
traversing all nodes with possibly equivalent names.
Alternatively, ∞DL Authority can coordinate the process of node name disambiguation among DPNs,
by requesting each DPN to rename some nodes in question.
It is really unlikely, but theoretically possible that two independent DPNs have exactly the same
∞DL Path to a DataObject. However, the actual content cannot be exactly the same, because the
system does not have legitimate ways leading to duplication of DataObjects. Even if a DataObject
contains the same set of DataItems, the DATE_TIME attribute will be different.
If the two DataObjects with exactly the same ∞DL Path are different in content, then their Hash
strings (their IDs) will be different (barring a mathematically improbable case discussed below).
Then the solution is to rename one of the two DataObjects. The ∞DL Path will change, but the ID
of the object will not change, because object name is not used in the computation of the Hash value.
Lastly, there is a theoretical possibility that two DataObjects with different content will have
exactly the same Hash value. Such cases are expected to be truly rare and will be lazily addressed
only when the conflict is recognized.
The remedy is to replace one of the conflicting objects with the same object but after addition
of a “corrective attribute” whose only purpose is to alter the value of Hash.