You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the benchmarks we collect only test metrics in a basic/naive implementation, there is potential to underestimate a workload's total energy impacts when it gets used at scale, due to those additional supporting compute/memory/storage resources. In an HA configuration, there is X additional compute/memory/network resource overhead to manage data consistency, queuing, load balancing, etc.
Even if these additional Kubernetes resource differences are locally small on a single node, at large scales these small differences aggregate. Also, each auto-scaling tool is using resources when it monitors scaling triggers, and executes a scaling event.
Required Research:
For a given tool, use the tool's official documentation to determine their different recommended deployment models, specifically looking for HA/Production paradigms, vs. single/local/test deployments
Create an HA-specific benchmark evaluation for that tool
if possible, find common HA paradigms across CNCF ecosystem tools that way we have something generic enough to account for many workload's common HA configurations
Determine if we are able to use the Power Capping Framework for control plane and/or worker nodes that we are running benchmarks on.
If yes, create a list of required outputs from PCF to gather for tests
Create a list of Prometheus node metrics that would provide the data needed for this evaluation
Create/Evaluate k8s Control Plane + Worker Node baselines to compare to HA workload delta
Desired Outcome:
Any tool that goes through our benchmarking can see both their core workload energy performance, and the delta with how their recommended deployment paradigms and auto scaling settings impact their energy footprint.
The text was updated successfully, but these errors were encountered:
Objective:
Create a set of tests, metrics, and outputs that represent the energy delta between:
Example Scenario:
Suppose I want to deploy a test instance of Grafana. A basic/naive implementation might be:
But, for production-like environments, a team uses the Grafana Helm chart, which adds:
Perhaps the team also uses things like:
If the benchmarks we collect only test metrics in a basic/naive implementation, there is potential to underestimate a workload's total energy impacts when it gets used at scale, due to those additional supporting compute/memory/storage resources. In an HA configuration, there is
X
additional compute/memory/network resource overhead to manage data consistency, queuing, load balancing, etc.Even if these additional Kubernetes resource differences are locally small on a single node, at large scales these small differences aggregate. Also, each auto-scaling tool is using resources when it monitors scaling triggers, and executes a scaling event.
Required Research:
node
metrics that would provide the data needed for this evaluationDesired Outcome:
Any tool that goes through our benchmarking can see both their core workload energy performance, and the delta with how their recommended deployment paradigms and auto scaling settings impact their energy footprint.
The text was updated successfully, but these errors were encountered: