• Software

    XXL: eXtensible and fleXible Library

    (Java-library for advanced query processing)

    JEPC: Java Event Processing Connectivity

    (Java-middleware for uniform event processing)

    Data Structure Navigator

    (Visualization of data structures and algorithms)

    Datasets

    This page serves as a source for datasets which we have used in our experiments.

  • Spatial Data
     
    No. Area Description number of MBRs zipped size in MB coverage source used for experiments in
    M1 L.A. streets 131,461 1.35 0.03 Tiger [BKS 93], [BSS 00], [DS 00]
    M2 L.A. rivers and railways 128,971 1.99 0.22 Tiger [BKS 93], [BSS 00], [DS 00]
    M3 california  streets 1,888,012 16.93 0.12 Tiger [BSS 00], [DS 00]
    M4 california railways 625,640 0.33 0.21 Tiger
    M5 california borders 234,251 2.82 Tiger
    M6 california hydrography 360,330 4.12 Tiger

  • The temporal data we used in our experiments with the multi-version B-tree is here (small size 100'000).
  • Datasets for MVBT experiments

    Description format description
    d50 ASCII file contains sequence of 10'000'000 triples (blank separated). Triples have following format: Operation type, long key, integer paylod. 1 decodes insert, 2 decodes update and 3 decodes delete; The first 1'000'000 operations are insertions (10% of the data set). The remaining 90% of the file represent a mix of insertions, deletions and updates. The portion of the specific operation is decoded in the file name. For example the file d50 consists of 1'000'000 insert operations followed by a mix of insertions ($4'500'000$) and deletions (4'500'000). The file u75 consists of 1'000'000 insert operations followed by a mix of insertions (2'250'000) and updates 6'750'000. Note default payload value is 0. Please, replace it for your purpose. In our experiments we replace it with 16 bytes paylod. Version numbers generated while reading the file line by line.
    u0
    u25
    u50
    u75
    u100

  • Results of Spatial Joins
    Description number of results format description zipped size in MB
    M2 & M1 85,854 M2.ID   M1.ID (the MBRs of M2 are numerated starting with 10,000,000, the MBRs of M1 with 0) 0.37
    M3 & M3 9,784,072

  • Results of k-nearest neighbors
    Description format description zipped size in MB
    20-nearest neighbors for each element in M2 M2.ID  M1.ID  k  'euclidian distance' (the center of the MBRs was used for the computation, the MBRs of M2 are numerated starting with 10,000,000, the MBRs of M1 with 0) 33.79

  • Datasets for Sort-based Parallel R-tree loading ([ASSS 2012])

    Description format description size in MB
    USA-data Contains the minimum bounding rectangles of all streets from TIGER files, containing 72 Million rectangles. The file is in hadoop sequence file format with datasets <NullWritable, DoublePointRectangle>.

    For convenience, we provide a plain data set consisting of rectangles with the following format: <xlow, ylow, xhigh, yhigh>, each coordinate occupying 8 bytes in double floating point format. The file can be obtained from here.
    3338
    E-USA-data Extended USA dataset, composed of four copies of USA-data by translating the original data set with the following vectors: (0.0, 0.0), (75.5, -33.9), (0.0, -33.9), (75.5, -3.9).

    A plain file can be obtained from here (see USA-data for the file formats).
    13414
    qr1 Query point data set obtained by considering every 100-th middle point of the rectangles from USA-data, consisting of 722,261 points. The files are in plain format, sequential point data with <x,y> coordinates, each coordinate occupying 8 bytes in double floating point format. 22
    qr2 Query rectangle data with quadratic rectangles where each rectangle returns 100 results on average, consisting of 722,226 rectangles. The file format ist the same as for qr1. 2.2
    qr3 Query rectangle data with quadratic rectangles where each rectangle returns 1000 results on average, consisting of 22,856 rectangles. The file format ist the same as for qr1. 0.7

     

     



    dittrich@mathematik.uni-marburg.de