Kubernetes + Prometheus: Monitoring Production System Health

Table of Contents

The Problem

Running Kubernetes in production without observability is flying blind. Your pods can be crash-looping, nodes quietly running out of memory, or a single misbehaving service starving others of CPU — and you won’t know until a customer tells you. The difficulty isn’t that monitoring tools don’t exist; it’s that wiring Prometheus into a real Kubernetes cluster involves a non-obvious chain of moving parts: service discovery, scrape configs, metric cardinality, RBAC permissions, and alert routing — all of which must be correct simultaneously.

Get it wrong and you end up with a Prometheus that scrapes nothing, dashboards that show stale data, or alerts that fire for the wrong reasons at 3 AM. Get it right and you have a self-updating monitoring system that automatically discovers new services, tracks every pod’s resource consumption, and pages your team only when something actually needs human attention.

This tutorial wires up Prometheus, Grafana, and Alertmanager on a real cluster from scratch.

Tech Stack & Prerequisites

Kubernetes v1.28+ — a running cluster (minikube, k3s, or a cloud-managed cluster like GKE/EKS/AKS)
kubectl v1.28+ — configured and pointing at your target cluster (kubectl cluster-info should return cleanly)
Helm v3.13+ — the package manager used to install the kube-prometheus-stack
kube-state-metrics v2.10+ — ships automatically with the Helm chart
node_exporter v1.7+ — also ships with the Helm chart
Grafana v10+ — bundled in the stack, used for dashboards
Alertmanager v0.26+ — bundled in the stack, used for routing alerts
A Slack webhook URL — for alert notifications (free, takes 2 minutes to set up)
4GB+ RAM available in your cluster — Prometheus is memory-hungry; don’t run this on a single 1GB node

Step-by-Step Implementation

Step 1: Setup — Install the Prometheus Stack via Helm

The kube-prometheus-stack Helm chart is the industry standard. It installs Prometheus, Grafana, Alertmanager, kube-state-metrics, and node-exporter in one command, pre-wired together.

Add the Helm repository:

bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create a dedicated namespace:

bash

kubectl create namespace monitoring

Create your values override file — prometheus-values.yaml:

yaml

# prometheus-values.yaml
# This file overrides defaults. Only set what you need to change.

prometheus:
  prometheusSpec:
    # How long to retain metrics data on disk
    retention: 15d

    # Persistent storage for metrics — critical for production
    # Without this, all data is lost when the pod restarts
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

    # Resource limits — tune to your cluster size
    resources:
      requests:
        memory: 1Gi
        cpu: 500m
      limits:
        memory: 2Gi
        cpu: 1000m

grafana:
  # Change this before deploying — never leave the default
  adminPassword: "changeme-use-a-secret-manager"

  persistence:
    enabled: true
    size: 10Gi

  # Pre-load the standard Kubernetes dashboards
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: default
          orgId: 1
          folder: ''
          type: file
          disableDeletion: false
          editable: true
          options:
            path: /var/lib/grafana/dashboards/default

  dashboards:
    default:
      kubernetes-cluster:
        # Grafana dashboard ID 7249 — Kubernetes cluster overview
        gnetId: 7249
        revision: 1
        datasource: Prometheus

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

# Disable components you don't need to reduce resource usage
kubeControllerManager:
  enabled: false   # Often inaccessible in managed clusters (GKE/EKS/AKS)
kubeScheduler:
  enabled: false   # Same — managed clusters don't expose this
kubeEtcd:
  enabled: false   # Managed clusters don't expose etcd metrics

Install the stack:

bash

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --version 55.5.0

Verify all pods come up healthy:

bash

kubectl get pods -n monitoring --watch

# Expected output (all should reach Running/1/1):
# NAME                                                  READY   STATUS    RESTARTS
# alertmanager-kube-prometheus-stack-alertmanager-0     2/2     Running   0
# kube-prometheus-stack-grafana-7d9b4f8d7-xk2lp        3/3     Running   0
# kube-prometheus-stack-operator-6b8f9d7c4-p9rmw       1/1     Running   0
# kube-prometheus-stack-prometheus-node-exporter-xxxx  1/1     Running   0
# prometheus-kube-prometheus-stack-prometheus-0         2/2     Running   0

Step 2: Configuration — Scraping Your Own Services

Out of the box, Prometheus scrapes cluster internals. To scrape your own application, expose a /metrics endpoint and create a ServiceMonitor.

First, instrument your Node.js app. Install the client library:

bash

npm install prom-client@15

Add metrics to your server.js:

javascript

// server.js
import express from 'express';
import client from 'prom-client';

const app = express();

// Collect all default Node.js metrics (event loop lag, GC, heap, etc.)
// These are the metrics that make up the "golden signals"
const register = new client.Registry();
client.collectDefaultMetrics({
  register,
  prefix: 'saas_app_',    // Namespace all metrics — avoids collisions
});

// --- Custom business metrics ---

// Counter: monotonically increasing, never resets (except on process restart)
// Use for: requests, errors, jobs processed
const httpRequestsTotal = new client.Counter({
  name: 'saas_app_http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register],
});

// Histogram: tracks distribution of values (p50, p95, p99 latencies)
// Use for: request duration, response sizes, queue wait times
const httpRequestDuration = new client.Histogram({
  name: 'saas_app_http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  // Buckets in seconds — cover your expected latency range
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
  registers: [register],
});

// Gauge: can go up and down
// Use for: active connections, queue depth, cache size
const activeConnections = new client.Gauge({
  name: 'saas_app_active_connections',
  help: 'Number of active WebSocket connections',
  registers: [register],
});

// Middleware to track all requests automatically
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();

  res.on('finish', () => {
    const labels = {
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode,
    };
    httpRequestsTotal.inc(labels);
    end(labels);
  });

  next();
});

// The /metrics endpoint — Prometheus scrapes this URL
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.get('/health', (req, res) => res.json({ status: 'ok' }));

app.listen(3000, () => console.log('Server running on :3000'));

Expose the metrics port in your Kubernetes Service:

yaml

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: saas-app
  namespace: default
  labels:
    # This label is how ServiceMonitor finds this Service
    app: saas-app
spec:
  selector:
    app: saas-app
  ports:
    - name: http
      port: 3000
      targetPort: 3000
    - name: metrics        # Named port — ServiceMonitor references this by name
      port: 9090
      targetPort: 3000     # Both point to 3000; your /metrics is on the app port

Create a ServiceMonitor to tell Prometheus to scrape it:

yaml

# k8s/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: saas-app-monitor
  namespace: monitoring    # Must be in the monitoring namespace
  labels:
    release: kube-prometheus-stack  # Must match the Helm release name
spec:
  namespaceSelector:
    matchNames:
      - default            # The namespace where your app lives
  selector:
    matchLabels:
      app: saas-app        # Matches the label on your Service
  endpoints:
    - port: metrics        # Matches the named port in your Service
      path: /metrics
      interval: 30s        # Scrape every 30 seconds
      scrapeTimeout: 10s

bash

kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/servicemonitor.yaml

Step 3: Core Logic — Alerting Rules & Alertmanager Routing

Create your alert rules file — k8s/prometheus-rules.yaml:

yaml

# k8s/prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: saas-app-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack   # Must match the Helm release name
spec:
  groups:
    - name: saas-app.rules
      # Evaluate these rules every 60 seconds
      interval: 60s
      rules:

        # --- Availability Alerts ---

        - alert: PodCrashLooping
          # Pod has restarted more than 5 times in the last 15 minutes
          expr: rate(kube_pod_container_status_restarts_total{namespace="default"}[15m]) * 60 * 15 > 5
          for: 5m      # Must be true for 5 minutes before firing — reduces flapping
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.pod }} is crash-looping"
            description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted {{ $value | printf \"%.0f\" }} times in 15 minutes."

        - alert: DeploymentReplicasMismatch
          # Desired replicas don't match available replicas
          expr: kube_deployment_spec_replicas{namespace="default"} != kube_deployment_status_replicas_available{namespace="default"}
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Deployment {{ $labels.deployment }} has unavailable replicas"
            description: "Deployment {{ $labels.deployment }} wants {{ $value }} replicas but they are not all available."

        # --- Performance Alerts ---

        - alert: HighRequestLatency
          # 95th percentile latency above 1 second for 5 minutes
          expr: histogram_quantile(0.95, sum(rate(saas_app_http_request_duration_seconds_bucket[5m])) by (le, route)) > 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High p95 latency on route {{ $labels.route }}"
            description: "95th percentile latency is {{ $value | printf \"%.2f\" }}s on route {{ $labels.route }}."

        - alert: HighErrorRate
          # More than 5% of requests returning 5xx over the last 5 minutes
          expr: |
            sum(rate(saas_app_http_requests_total{status_code=~"5.."}[5m]))
            /
            sum(rate(saas_app_http_requests_total[5m])) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Error rate above 5%"
            description: "{{ $value | humanizePercentage }} of requests are returning 5xx errors."

        # --- Resource Alerts ---

        - alert: NodeMemoryPressure
          # Node using more than 90% of available memory
          expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "Node {{ $labels.instance }} memory above 90%"
            description: "Node memory usage is at {{ $value | humanizePercentage }}."

        - alert: PodHighCPU
          # Pod using more than 80% of its CPU limit for 10 minutes
          expr: |
            sum(rate(container_cpu_usage_seconds_total{namespace="default", container!=""}[5m])) by (pod)
            /
            sum(kube_pod_container_resource_limits{namespace="default", resource="cpu"}) by (pod) > 0.8
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} CPU above 80% of limit"
            description: "Pod {{ $labels.pod }} is using {{ $value | humanizePercentage }} of its CPU limit."

bash

kubectl apply -f k8s/prometheus-rules.yaml

Configure Alertmanager to route to Slack. Create k8s/alertmanager-config.yaml:

yaml

# k8s/alertmanager-config.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: saas-app-alertmanager-config
  namespace: monitoring
spec:
  route:
    # Group alerts by alertname and namespace to reduce noise
    groupBy: ['alertname', 'namespace']
    groupWait: 30s        # Wait 30s before sending the first alert in a group
    groupInterval: 5m     # How long to wait before re-sending an ongoing group
    repeatInterval: 4h    # How often to re-notify if alert is still firing
    receiver: slack-critical
    routes:
      # Critical alerts go to #incidents channel immediately
      - matchers:
          - name: severity
            value: critical
        receiver: slack-critical
      # Warnings go to #alerts channel with less urgency
      - matchers:
          - name: severity
            value: warning
        receiver: slack-warnings

  receivers:
    - name: slack-critical
      slackConfigs:
        - apiURL:
            # Store the webhook URL as a Kubernetes Secret, not inline
            name: alertmanager-slack-secret
            key: webhookURL
          channel: '#incidents'
          sendResolved: true   # Also notify when the alert clears
          title: '{{ if eq .Status "firing" }}🔴{{ else }}✅{{ end }} {{ .CommonLabels.alertname }}'
          text: |
            {{ range .Alerts }}
            *Summary:* {{ .Annotations.summary }}
            *Description:* {{ .Annotations.description }}
            *Severity:* {{ .Labels.severity }}
            *Started:* {{ .StartsAt | since }}
            {{ end }}

    - name: slack-warnings
      slackConfigs:
        - apiURL:
            name: alertmanager-slack-secret
            key: webhookURL
          channel: '#alerts'
          sendResolved: true
          title: '⚠️ {{ .CommonLabels.alertname }}'
          text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

Create the Kubernetes Secret for the Slack webhook:

bash

kubectl create secret generic alertmanager-slack-secret \
  --from-literal=webhookURL='https://hooks.slack.com/services/YOUR/WEBHOOK/URL' \
  --namespace monitoring

bash

kubectl apply -f k8s/alertmanager-config.yaml

Step 4: Testing — Verify Everything Is Wired Up

4a. Access Prometheus UI:

bash

kubectl port-forward svc/kube-prometheus-stack-prometheus -n monitoring 9090:9090

Open http://localhost:9090. Go to Status → Targets — you should see your saas-app target listed as UP.

4b. Access Grafana:

bash

kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3001:80

Open http://localhost:3001, login with admin / the password you set. Go to Dashboards → Browse — the Kubernetes cluster dashboard should be pre-loaded.

4c. Run a live PromQL query to confirm your app metrics are flowing:

In the Prometheus UI Expression field, run:

promql

# Requests per second, broken down by route and status code
sum(rate(saas_app_http_requests_total[5m])) by (route, status_code)

# p95 latency across all routes
histogram_quantile(0.95, sum(rate(saas_app_http_request_duration_seconds_bucket[5m])) by (le))

# Check how many pod restarts have occurred in the last hour
increase(kube_pod_container_status_restarts_total{namespace="default"}[1h])

4d. Trigger a test alert to confirm Alertmanager → Slack routing works:

bash

# Port-forward Alertmanager
kubectl port-forward svc/kube-prometheus-stack-alertmanager -n monitoring 9093:9093

# Send a test alert via the Alertmanager API
curl -X POST http://localhost:9093/api/v2/alerts \
  -H "Content-Type: application/json" \
  -d '[{
    "labels": {
      "alertname": "TestAlert",
      "severity": "critical",
      "namespace": "default"
    },
    "annotations": {
      "summary": "This is a test alert — ignore",
      "description": "Fired manually to verify Slack routing."
    },
    "startsAt": "2024-01-01T00:00:00Z",
    "endsAt":   "2099-01-01T00:00:00Z"
  }]'

Check #incidents in Slack. If the message arrives, your full pipeline — metric collection → alerting → notification — is working end to end.

Common Errors & Troubleshooting

Gotcha #1: `ServiceMonitor` created but target shows as `Unknown` or never appears in Prometheus

Symptom: You applied the ServiceMonitor but http://localhost:9090/targets doesn’t show your app, or shows it as Unknown.

Fix: The release label on the ServiceMonitor must exactly match the Helm release name. The Prometheus Operator uses this label to decide which ServiceMonitor resources it should pick up.

bash

# Check what label the Operator is watching for
kubectl get prometheus -n monitoring -o yaml | grep serviceMonitorSelector -A5

# Output will look like:
# serviceMonitorSelector:
#   matchLabels:
#     release: kube-prometheus-stack   <-- This must match your ServiceMonitor label

# Verify your ServiceMonitor has it
kubectl get servicemonitor saas-app-monitor -n monitoring -o yaml | grep release

If the label is missing or mismatched, patch it:

bash

kubectl label servicemonitor saas-app-monitor -n monitoring release=kube-prometheus-stack

Gotcha #2: `PrometheusRule` is applied but alerts never appear in `http://localhost:9090/alerts`

Symptom: kubectl get prometheusrule -n monitoring shows your rule, but the Prometheus UI shows no alerts under the rule name.

Fix: Same root cause as Gotcha #1 — the release label. Also check the rule for PromQL syntax errors, which silently prevent the rule from loading:

bash

# Check Prometheus Operator logs for rule evaluation errors
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-operator --tail=50

# Validate your PromQL expressions before applying
# Use the Prometheus UI: http://localhost:9090/graph
# Paste the expr value and confirm it returns data before putting it in a rule

# Check the rule was picked up by the Operator
kubectl get prometheusrule saas-app-alerts -n monitoring -o jsonpath='{.metadata.labels}'

Gotcha #3: Prometheus pod is `OOMKilled` — keeps restarting

Symptom: kubectl describe pod prometheus-kube-prometheus-stack-prometheus-0 -n monitoring shows OOMKilled as the last state reason.

Fix: Prometheus stores all active time series in memory. Two levers to pull:

yaml

# In prometheus-values.yaml — increase the memory limit
prometheus:
  prometheusSpec:
    resources:
      limits:
        memory: 4Gi    # Bump from 2Gi

    # Also reduce retention to lower the total series count
    retention: 7d      # Down from 15d

    # Add this to drop high-cardinality metrics you don't need
    # Each unique label combination = one time series
    remoteWriteRelabelings: []

Find your highest-cardinality metrics to understand what’s eating memory:

promql

# Top 10 metrics by number of active time series
topk(10, count by (__name__)({__name__=~".+"}))

Apply the new values:

bash

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --version 55.5.0

Security Checklist

Never expose Prometheus or Alertmanager publicly — both UIs have no authentication by default; use kubectl port-forward for local access or put them behind an authenticated ingress (OAuth2 Proxy is a common pattern)
Store all secrets as Kubernetes Secrets, not inline in YAML — the Slack webhook URL, Grafana admin password, and any remote write credentials must live in Secrets, not values.yaml
Commit prometheus-values.yaml to git but ensure it contains no literal secrets — use placeholder values and inject via CI or a secrets manager (Vault, External Secrets Operator)
Set runAsNonRoot: true in all monitoring pod security contexts — the Helm chart does this by default; don’t override it
Enable RBAC — the Prometheus Operator creates least-privilege ClusterRoles; audit them with kubectl get clusterrole -n monitoring before promoting to production
Restrict /metrics endpoint access — add network policy so only Prometheus pods can scrape your app’s metrics port
Rotate Grafana’s admin password immediately after first login and store it in a Secret, not in values.yaml
Use scrapeTimeout less than interval — prevents slow targets from blocking the scrape queue and hiding other metrics

Finly Insights Team

Finly Insights Team is a group of software developers, cloud engineers, and technical writers with real hands-on experience in the tech industry. We specialize in cloud computing, cybersecurity, SaaS tools, AI automation, and API development. Every article we publish is thoroughly researched, written, and reviewed by people who have actually worked in these fields.

The Problem

Tech Stack & Prerequisites

Step-by-Step Implementation

Step 1: Setup — Install the Prometheus Stack via Helm

Step 2: Configuration — Scraping Your Own Services

Step 3: Core Logic — Alerting Rules & Alertmanager Routing

Step 4: Testing — Verify Everything Is Wired Up

Common Errors & Troubleshooting

Gotcha #1: ServiceMonitor created but target shows as Unknown or never appears in Prometheus

Gotcha #2: PrometheusRule is applied but alerts never appear in http://localhost:9090/alerts

Gotcha #3: Prometheus pod is OOMKilled — keeps restarting

Security Checklist

Related Posts

Leave a Comment Cancel Reply

Gotcha #1: `ServiceMonitor` created but target shows as `Unknown` or never appears in Prometheus

Gotcha #2: `PrometheusRule` is applied but alerts never appear in `http://localhost:9090/alerts`

Gotcha #3: Prometheus pod is `OOMKilled` — keeps restarting