Continuous state estimation for heterogeneous Hadoop clusters
Hadoop is a popular and extremely successful framework for horizontally scalable distributed computing over large data sets based on the MapReduce framework. We present a monitoring tool for a heterogeneous Hadoop cluster to monitor real time performance of every node in the cluster. The performance of a node is measured in terms of slowdown. The monitoring tool is designed to help system administrators detect underperforming node(s). Additionally, our tool also helps in identifying which resource (CPU or Disk) in the node is affected by the problem. In its current implementation, Hadoop assumes a homogeneous cluster of compute nodes. This assumption is manifest in Hadoop's scheduling algorithms, but is also crucial to existing approaches for detecting performance issues, which rely on the peer similarity between nodes. It is desirable to enable efficient use of Hadoop on heterogeneous clusters as well as on a virtual/cloud infrastructure, both of which violate the peer-similarity assumption. We have implemented the monitoring tool and present preliminary results on an eight node heterogeneous Hadoop cluster at PARC. We show that using our tool, resource specific performance problems (e.g., CPU contention, disk I/O contention) in a node can be detected by a system administrator.
Gupta, S.; Fritz, C.; Price, R.; Hoover, R.; de Kleer, J.; Witteveen, C. Continuous state estimation for heterogeneous Hadoop clusters. International Workshop on Principles of Diagnosis: DX-2013; 2013 October 1-4; Jerusalem, Israel.