A Control-Theory Approach for Cluster Autonomic Management: Maximizing Usage While Avoiding Overload

Agustín Gabriel Yabo1, Olivier Richard2, Bruno Bzeznik2, Bogdan Robu, Eric Rutten3

  • 1INRIA
  • 2Univ. Grenoble Alpes
  • 3INRIA Grenoble - Rhone-Alpes

Details

Category

Regular Session

Sessions

10:30 - 12:30 | Mon 19 Aug | Lau, 6-213 | MoA6

Predictive Control 1

Full Text

Abstract

In data centers, Cloud and HPC (High-Performance Computing) systems have increasingly become more varying in their behavior, in particular in aspects such as performance and power consumption, and the fact that they are becoming less predictable demands more runtime management. In this work, we describe results addressing autonomic administration in HPC systems for scienti c work ows management through a systems and control theory approach. We propose a model described by speci c parameters related to the key aspects of the infrastructure, from the Computer Science point of view, thus achieving a deterministic dynamical representation that contemplates the varying behaviors of the real computing system. Later on, we propose a simple model-predictive control loop to achieve two di erent objectives: a) maximize cluster utilization by best-e ort jobs, and b) control the le server’s load due to the impact of the jobs. The accuracy of the prediction relies on a parameter estimation scheme based on the well-known EKF (Extended Kalman Filter) to adjust the predictive-model to the real system, making the approach adaptive to parametric variations in the infrastructure. We show there is an average performance improvement of 8%, and consequently a reduction in the total computation time, when implementing the closed-loop strategy in the real system. The problem is addressed in a general way, to allow the implementation on similar HPC computing platforms, as well as scalability to di erent infrastructures.

Additional Information

No information added

Video

No videos found