S R T B H P N

∞ Data Library - Research Process Context

Author: Alexey Zaitsev


∞DL Data Objects in Context of the Global Research Process

The global realization of ∞DL starts with creation of individual Data Producing Nodes (DPNs), driven by individuals and small groups. The DPNs are expected to make best effort to properly represent the organizational structure of the data producing site. Despite independence of individual DPN creator(s), submitted data objects must be attached to the global ∞DL Tree which, apart from data objects’ unique IDs, are supposed to create a unique path to a data object in the ∞DL universe. The ∞DL Tree is a purely abstract concept unrelated to any geospatial or Internet locations. The ∞DL Tree starts with a global root ∞DL, which is nothing more than a pledge of allegiance to the concept. DPN nodes are expected to provide a valid DPN path starting from the global root and faithfully reflecting the administrative structure. A realistic example is “∞DL/edu.utah/SOM/CVRTI/Zaitsev lab” From the DPN root “/Zaitsev lab” a subtree will grow, which will include Project Nodes and Data Nodes. A complete path to a DataObject could be like this: “∞DL/utah.edu/SOM/CVRTI/Zaitsev lab/RVOT conduction/Stress kinases and RVOT vulnerability /TTX+KN93/exp2018_11_21A/Raw Data” Further, ∞DL supports syntax for addressing and accessing subranges of DataItems. Within a DataObject, each DataItem is addressed using #name suffix, and a subrange of data within a DataItem is addressed using @rangeString suffix. Extending the example above, assume the DataObject named “Raw Data” has a DataItem called “movie 003” which is a 3D array with dimensions of X, Y, and frameNo. Then the full ∞DL Tree path to frame 5 in “move 003” will be as follows: “∞DL/utah.edu/SOM/CVRTI/Zaitsev lab/RVOT conduction/Stress kinases and RVOT vulnerability /TTX+KN93/exp2018_11_21A/Raw Data#movie 003@*,*,5” With this federation approach for independent DPN creation, two types of name conflicts or ambiguities may arise. A request to access a Project Node may be ambiguous due to multiple paths leading to the same Project Node. For example, the path“∞DL/edu.utah/SOM/CVRTI” probably refers to the same entity as the path“∞DL/edu.utah/School of Medicine/CVRTIindent”. This situation could arise, for example, if two different labs at the CVRTI started independent DPNs without good agreement on the names of the administrative units. This is mostly an administrative problem, however, which can be remedied by ∞DL Authority storing lists of equivalent names for various ORG and DIV Nodes, and traversing all nodes with possibly equivalent names. Alternatively, ∞DL Authority can coordinate the process of node name disambiguation among DPNs, by requesting each DPN to rename some nodes in question. It is really unlikely, but theoretically possible that two independent DPNs have exactly the same ∞DL Path to a DataObject. However, the actual content cannot be exactly the same, because the system does not have legitimate ways leading to duplication of DataObjects. Even if a DataObject contains the same set of DataItems, the DATE_TIME attribute will be different. If the two DataObjects with exactly the same ∞DL Path are different in content, then their Hash strings (their IDs) will be different (barring a mathematically improbable case discussed below). Then the solution is to rename one of the two DataObjects. The ∞DL Path will change, but the ID of the object will not change, because object name is not used in the computation of the Hash value. Lastly, there is a theoretical possibility that two DataObjects with different content will have exactly the same Hash value. Such cases are expected to be truly rare and will be lazily addressed only when the conflict is recognized. The remedy is to replace one of the conflicting objects with the same object but after addition of a “corrective attribute” whose only purpose is to alter the value of Hash.