data/Temporal/tempdata • Directory Lister

Each of the subdirectories with prefix "dataset" contains
the following files:
query.k0
query.t0
query.t1.k1
query.t1.k10
query.t1.k100
query.t10.k1
query.t100.k1
source.gz
The source file contains records to build up a temporal database. A record
consists of three components:
1. a flag f
2. a key value k
3. an integer i
The flag denotes whether the record has to be inserted (f = 1) or
updated (f = 2) or deleted (f = 0).
The key value is used for identifying the record at a given point in time.
Note that the key does not identify a record in the entire temporal database.
The integer is used as an information associated to the key. I stored here
the number of update operations for the key.
The source file is read line by line. For each line, we take the line number
as the temporal value where the corresponding operation is performed in the
temporal database. For example, "1 908117189 0" is read from line i
the record with key 908117189 and time period (i,*) is inserted into
the database. The special value "*" indicates that the record is alive.
whereas the other 90,000 operations are a mix of insertions and updates.
The ratio of updates and insertions is known from the suffix of the directory
name: the value after the letter u gives the percentage of updates, the value
after the letter i gives the percentage of insertions. The sum must always
be 100.
The distribution of the records inserted into the database is uniform.
Moreover, the update operation is performed on a record which
is chosen randomly from the set of live records.

The query files query.tx.ky contain a set of 1000 records (kl,ku,tl,tu,n).
Each record is used for running a range-period query on the database.
The records in the database are retrieved where the key is in [kl,ku]
and the record was alive in the time-period [tl,tu]. The fifth component
gives the number of qualifying records (this is just to check whether the
search algorithm is ok). Note that all queries from the query file have the
same size, i.e., they cover 0.01 % of the entire data space.
The parameters x and y (known from the name of the query file) indicates
the so-called shape factor of the queries. One of the values is 1, whereas
the other is 1,10 or 100. This indicates how large the size of the key range
is in comparison to the time range, or vice versa. For example, the file
query.t10.k1 contains 1000 queries of size 0.01% whereas the relative
time range (3,1%) is 10 times larger than the relative key range (0,31%).

The query file query.t0 contains 100 time slice queries, whereas the file
query.k0 contains 100 pure key queries.

Have fun with the data and the queries!

Bernhard