Develop Implementation Plan for evaluating downstream energy impacts of scalable workloads #151

SRF-Audio · 2024-12-12T17:39:40Z

Objective:

Create a set of tests, metrics, and outputs that represent the energy delta between:

a baseline minimum deployment test workload
a production-level HA version of that workload
Optional: find a metric analogous to big-O notation that instead represents how an HA workload's energy delta from baseline scales with the workload

Example Scenario:

Suppose I want to deploy a test instance of Grafana. A basic/naive implementation might be:

Deployment (just pointing to the public Docker image)
ClusterIP Service
PVC

But, for production-like environments, a team uses the Grafana Helm chart, which adds:

Replica Sets
Configmaps
RBAC
Endpoints
Secrets

Perhaps the team also uses things like:

If the benchmarks we collect only test metrics in a basic/naive implementation, there is potential to underestimate a workload's total energy impacts when it gets used at scale, due to those additional supporting compute/memory/storage resources. In an HA configuration, there is X additional compute/memory/network resource overhead to manage data consistency, queuing, load balancing, etc.

Even if these additional Kubernetes resource differences are locally small on a single node, at large scales these small differences aggregate. Also, each auto-scaling tool is using resources when it monitors scaling triggers, and executes a scaling event.

Required Research:

For a given tool, use the tool's official documentation to determine their different recommended deployment models, specifically looking for HA/Production paradigms, vs. single/local/test deployments
- Create an HA-specific benchmark evaluation for that tool
- if possible, find common HA paradigms across CNCF ecosystem tools that way we have something generic enough to account for many workload's common HA configurations
Determine if we are able to use the Power Capping Framework for control plane and/or worker nodes that we are running benchmarks on.
- If yes, create a list of required outputs from PCF to gather for tests
Create a list of Prometheus node metrics that would provide the data needed for this evaluation
Create/Evaluate k8s Control Plane + Worker Node baselines to compare to HA workload delta

Desired Outcome:

Any tool that goes through our benchmarking can see both their core workload energy performance, and the delta with how their recommended deployment paradigms and auto scaling settings impact their energy footprint.

The text was updated successfully, but these errors were encountered:

SRF-Audio · 2024-12-12T18:24:59Z

Some useful reading:
https://blog.scottlogic.com/2023/12/08/conscientious-computing-accurately-measuring-the-energy-consumption-of-hardware.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop Implementation Plan for evaluating downstream energy impacts of scalable workloads #151

Develop Implementation Plan for evaluating downstream energy impacts of scalable workloads #151

SRF-Audio commented Dec 12, 2024 •

edited

Loading

SRF-Audio commented Dec 12, 2024

Develop Implementation Plan for evaluating downstream energy impacts of scalable workloads #151

Develop Implementation Plan for evaluating downstream energy impacts of scalable workloads #151

Comments

SRF-Audio commented Dec 12, 2024 • edited Loading

Objective:

Example Scenario:

Required Research:

Desired Outcome:

SRF-Audio commented Dec 12, 2024

SRF-Audio commented Dec 12, 2024 •

edited

Loading