The timely processing of continuous data streams gains increasing importance in a variety of fields. Self-healing systems depend on efficient data analysis to detect problems and apply appropriate counter measures. This paper introduces Bitflow, a stream processing framework optimized for the data analysis tasks in self-healing IT systems. Numerous algorithmic contributions allow to mine monitoring data obtained from critical system components, in order to detect and classify anomalies, and to localize their root cause. These data analysis tasks are traditionally executed on big data processing platforms, which run on dedicated hosts and assume complete ownership over the occupied resources. Bitflow takes a different approach by analyzing the monitoring data directly at its source – i.e., in situ. We exploit the fact that IT systems are usually over-provisioned and a fraction of the computational resources can be allocated for self-healing functionality. Bitflow implements a dynamic modeling approach for dataflow graphs, which adapts to varying environments, such as changing data sources, or system components. Further, we describe Bitflow’s scheduling approach, which determines when it is beneficial to migrate a data analysis process to a remote host. Experimental data from practical data analysis tasks shows the applicability of our scheduling solution.
2020