PIPES - a Public Infrastructure for Processing and Exploring Streams

Latest News

Note that the further development of the PIPES project is provided by RTM Realtime Monitoring GmbH.

Motivation

Over the next years, a tremendous number of sensors will be installed in our environment. More and more data is continuously delivered from these devices as a stream. In general, a large number of streams are required to provide the desired information and each of the streams outputs a large number of data items. Ideally, users pose ad-hoc queries on streams, similar to a traditional DBMS. There are however fundamental differences: a query runs until the user explicitly stops it and, streaming data items are generally valid for a short period of time only. This leads to a substantial change in query processing. Therefore, systems for streams are primarily designed for the management of queries, of which there might be millions running simultaneously, whereas data items of the streams are kept in the system only temporarily. Addressing the issue of efficiently processing active data streams, we are currently developing PIPES, a flexible and extensible infrastructure providing fundamental building blocks to implement a data stream management system (DSMS). It is seamlessly integrated into the Java library XXL for advanced query processing and extends XXL's scope towards continuous data-driven query processing over autonomous data sources.

Research Topics

In PIPES, we address the following research issues:

People

The following people primarily attend to the development of PIPES:

Faculty: Prof. Dr. Bernhard Seeger

Graduate: Undergraduate:
Michael Cammert Tim Dörnemann (DFG)
Christoph Heinz (DFG) Maxim Schwarzkopf (DFG)
Jürgen Krämer (DFG) Sonny Vaupel (DFG)
Tobias Riemenschneider  

Additionally, we want to thank all people not listed above that actively supported the development of PIPES with their ideas, programming, and master theses.

Download

Visit our XXL download page to get the latest version of PIPES.

Overview

The infrastructure PIPES is composed of several interacting frameworks. The core framework allows to construct directed acyclic query graphs based on a publish-subscribe mechanism integrated into the graph nodes. We distinguish between three different types of graph nodes: A source transfers its elements to a set of subscribed sinks. A sink can subscribe and unsubscribe to multiple sources, respectively. During its subscription, it processes all incoming elements delivered by its sources. All operators satisfy the interface pipe that combines the functionality of a sink and a source. Hence, a pipe processes the incoming elements and transfers its outgoing elements to all subscribed sinks. A powerful and generic operator algebra provides the basis for the query construction framework. It relies on a well-defined, deterministic temporal semantics over time intervals that guarantees snapshot-equivalence. As a result, PIPES covers the functionality of the Continuous Query Language (CQL). Our operator algebra supports all operations known from the extended relational algebra, except sorting, whereas the operators are independent from concrete data types due to a parameterization with user-defined functions and predicates. Their data-driven implementation relies on time-based sliding windows and an incremental evaluation to ensure a non-blocking behavior. The current state of an operator is maintained by employing so-called SweepAreas, highly dynamic data structures with efficient insertion, retrieval, and reorganization capabilities managing the content of a sliding window. By adjusting the window size, the resource requirements of an operator can be controlled at runtime.
In order to build a DSMS, PIPES also contains separate frameworks that establish a basis for other essential runtime components, namely the scheduler, the memory manager, and the query optimizer. These components are controlled by flexible strategies which typically incorporate operator metadata maintained at runtime, e.g. mean stream rates or operator selectivities. Therefore, each node in a query graph continuously collects and updates relevant metadata information that is maintained in an online manner.
The scheduling framework provided by PIPES consists of a 3-layer architecture. In the lowest layer, multiple successive nodes in a query graph can be combined as a virtual node with an inherently push-based processing. This novel approach substantially reduces the communication overhead since it does not require any inter-operator queues. The middle layer allows to schedule virtual nodes within a single thread by decoupling successive virtual nodes with the help of an operator that buffers elements. In the top layer, all threads of the middle layer are scheduled. This hybrid approach benefits from neither being restricted to a single thread for scheduling nor assigning a separate thread per operator. Due to dynamic system load caused by varying stream rates and frequent query graph modifications, scalability may suffer from limited memory resources.
To overcome this problem, the memory manager offers a flexible framework for an adaptive memory management. It globally assigns and redistributes the available memory among stateful operators at runtime. Furthermore, it takes approximate query answers into account by applying user-defined load-shedding techniques whenever an operator exceeds its memory consumption limit. The query optimizer is based on the semantics of our temporal operator algebra that retains most traditional transformation rules. Therefore, we are currently developing a rule-based query optimizer that is able to integrate a new continuous query into the running query graph, while sharing preferably large subgraphs. This is achieved by heuristically choosing from a set of snapshot-equivalent query plans the best one according to a cost model.

Demo

We have developed an online version of our demo of PIPES at SIGMOD 2004. This demo shows how PIPES can be used to execute continuous queries in two completely different application domains, namely traffic management and online auctions. Both scenarios are plausible as they are similar or rely on presently developed benchmarks: Running the demo requires Java Web Start. See our help file for additional information.

Please click here to start our demo of PIPES.

XXL-Viewer

The XXL-Viewer offers an intuitive way of visual Java programming. We already presented this powerful tool at the SIGMOD conference 2004. Feel free to start our online version. In order to become familiar with the XXL-Viewer, read our tutorial (soon also available in english) that describes the basic steps.

Since our library XXL is completely developed in Java, the XXL-Viewer provides an easy access to the functionality of XXL. Just add xxl.jar to the classpath by calling "Add jar" in the explorer of the XXL-Viewer and you can explore XXL.

In order to get familiar with the XXL-Viewer, examine a simple awt example by opening this file with the XXL-Viewer. A more complex example modeling a continuous query over traffic data is illustrated here. Note that running the XXL-Viewer requires Java Web Start.

Publications

2007

2006

2005

2004

2003

Acknowledgements

This work has been supported by the German Research Society (DFG) under grant no. SE 553/4-1,2,3.