S T B H P N

Project #1 - Key/Value Database

Spring 2018

Version 1.1, Due Date: Tuesday, February 6th

Purpose:

There is currently a lot of technical interest in "Big Data". Extreme examples are: data collection and analyses from the Large Hadron Collider, the Sloan Sky Survey, analyses of Biological Genomes, measuring weather patterns, collecting data for global climate models, and analyzing client interactions in social networks.

Conventional SQL databases are not well suited for these kinds of applications. While they have worked very well for many business applications and record keeping, they get overwhelmed by massive streams of data. Developers are turning to "noSQL" databases like MongoDB, couchDB, Redis, and Big Table to handle massive data collection and analyses.

In this project we will explore how a non SQL database can be constructed and used. In the second project we will use this database to store package information in a code repository.

Note:
This project is similar to Project #2, Spring 2017. However, there are enough differences in requirements to leave you with significant design and implementation work. You will find prototype code in the Course Repository.
The prototype provides code that satisfies Requirement #2. That allows you to look at C++ code related to your project to get you started, let your begin learning the language syntax quickly, and see how code should be structured.

Requirements:

Your NoSqlDb Key/Value Database project:
  1. (1) Shall be implemented in C++ using the facilities of the standard C++ Libraries and Visual Studio 2017, as provided in the ECS clusters.
  2. (1) Shall use the C++ standard library's streams for all console I/O and new and delete for all heap-based memory management1.
  3. (1) Shall implement a generic key/value in-memory database where each value consists of:
    • Metadata:
      • A name string
      • A text description of the item
      • A DateTime instance recording the date and time the value was written to the database.
      • a finite number (possibly zero) of child relationships with other values. Each element of the child relationships collection is the key of another item in the database. Any item holding a key in this collection is the parent of the element accessed by the key.
    • An instance of the generic type1. This might be a string, a container of a set of values of the same type, or some other collection of data captured in some, perhaps custom, data structure. We will refer to this instance as the database element's payload.
  4. (1) Shall support addition and deletion of key/value pairs.
  5. (2) Shall support editing of values including the addition and/or deletion of relationships, editing text metadata, and replacing an existing value's instance with a new instance. Editing of keys is forbidden.
    If you delete child relationships, e.g., delete keys from an element's child collection, you are forbidden to delete the elements accessed from those keys from the database.
  6. (4) Shall support queries2 for:
    • The value of a specified key.
    • The children of a specified key.
    • The set of all keys matching a specified regular-expression pattern.
    • All keys that contain a specified string in their metadata section, where the specification is based on a regular-expression pattern.
    • All keys that contain values written within a specified time-date interval. If only one end of the interval is provided shall take the present as the other end of the interval.
  7. (3) Shall support queries on the set of keys returned by a previous query3, e.g., an "and"ing of multiple queries. Shall also support queries on the union of results of one or more previous queries, e.g., an "or"ing of multiple queries.
  8. (3) Shall, on command, persist a colllection of database contents, defined by a collection of keys, to an XML file4,5. It is intended that this process be reversible, e.g., the database can be restored or augmented6 from an existing XML file as well as write its contents out to one or more XML files.
  9. (2) Shall implement a Payload type that contains a string, which for Project #2, will be the path to a file in the Repository. The Payload type shall also contain a std::vector<std::string>, where each string is the name of a category associated with the Payload file. For the Repository, the key will be a file name, so the combination of key and payload will provide details about a file stored in the Repository.
  10. (1) Shall provide, in your implementation, at least the following packages: Executive, DBCore, Query, Test.
  11. (2) Shall be submitted with contents, in the form of an XML file, that describe your project's package structure and dependency relationships that can be loaded when your project is graded.
  12. (3) Shall be accompanied by a test executive that clearly demonstrates you've met all the functional requirements #2-#9, above7. If you do not demonstrate a requirement you will not get credit for it even if you have it correctly implemented.
  13. (1) Shall provide a pdf file containing a package diagram8 of your project.

  1. This instance of the generic type is the "payload" - the data - that we store in the database. The metadata simply provides information about this "payload". Generic types in C++ are template classes.
  2. One of the things your project grade will be based on is the language you develop to create and use queries.
  3. The intent of this requirement is to support compound queries. That is, the elements that match a query may have conditions on their keys, children, name, time-date, and/or payload, imposed by previous queries.
  4. XML is easy to parse with existing tools so it makes a good storage mechanism for data that will be examined by other tools. It is not very space-efficient so a "production" version of this database might use some other more efficient, but more complex storage implementation.
  5. For value instances that are not strings, you will need to provide a toString() method to persist the instance into the text of an XML element. If the value is an instance of a structured type, then the persistence is slightly more complicated. That will be discussed in one of our Friday help sessions.
  6. By augmented we mean that the database already contains data and the contents of the XML file are read and inserted into the database without destroying the original data.
  7. This requirement is intended to make you design a test sequence that displays each requirement's processing in a separate function, called in the Executive package's main function.
  8. We recommend that you use gliffy to construct the diagram, then capture the diagram image using the windows snipping tool and paste into a word document. That you can save as a pdf document. Gliffy is available in the chrome webstore.
  9. So, you can find all the db items in a given category by enumerating the category's list of keys. This facility is very like conventional database indexes. Note that you could implement this requirement with another instance of your NoSqlDB.

What you need to know:

In order to successfully meet these requirements you will need to:
  1. Write C++ code and use basic facilities of the C++ standard libraries.
  2. Use the STL Containers.
  3. Read and Write XML files.