30.10.2024

1 +1

Administration

Prometheus Monitoring System: Complete Setup Guide for Infrastructure & Application Monitoring

As modern infrastructure grows in complexity — spanning microservices, containers, and distributed systems — robust monitoring is no longer optional. It is a fundamental requirement for maintaining performance, reliability, and scalability. Prometheus, an open-source monitoring and alerting toolkit, has emerged as one of the most widely adopted solutions for collecting and visualizing time-series metrics across diverse environments.

Originally developed by SoundCloud and now a graduated Cloud Native Computing Foundation (CNCF) project, Prometheus is trusted by engineering teams worldwide. This comprehensive guide covers everything you need to know: what Prometheus is, how it works, its core components, and a complete step-by-step setup process — including Node Exporter, Grafana dashboards, and alerting configuration.

What Is Prometheus?

Prometheus is a powerful, open-source monitoring system designed to collect, store, and query time-series data — measurements or events tracked over time with high-resolution timestamps. It enables teams to visualize system health, analyze trends, and trigger intelligent alerts before small issues escalate into critical outages.

Key Features of Prometheus

Feature	Description
Time-Series Data Model	Metrics are stored as time-stamped sequences, enabling trend analysis and historical comparisons
Pull-Based Monitoring	Prometheus actively scrapes metrics from target endpoints rather than waiting for systems to push data
PromQL	A flexible, expressive query language for filtering, aggregating, and analyzing metrics in real time
Alertmanager Integration	Define threshold-based rules and route notifications to email, Slack, PagerDuty, and more
Service Discovery	Automatically discovers and scrapes targets in dynamic environments such as Kubernetes clusters
Multi-Dimensional Data	Labels allow you to slice and dice metrics across dimensions like region, instance, or service name

These capabilities make Prometheus an ideal choice for teams running workloads on VPS Hosting, bare-metal infrastructure, or containerized platforms.

How Prometheus Works

Prometheus follows a clean, well-defined architecture built around the collection and storage of time-series data. Understanding this architecture is essential before deploying it in production.

Core Workflow

Metric Collection (Scraping): Prometheus periodically sends HTTP requests to configured target endpoints — known as *exporters* — to collect metrics. The scrape interval is fully configurable.

Time-Series Storage: Collected metrics are persisted in Prometheus's built-in time-series database (TSDB). Each data point carries a Unix timestamp and a set of key-value labels for identification.

Querying with PromQL: Engineers use PromQL to query stored metrics, generate graphs, build dashboards, or define alert conditions based on real-time and historical data.

Alerting Pipeline: When a metric crosses a predefined threshold, Prometheus fires an alert to the Alertmanager, which deduplicates, groups, and routes notifications to the appropriate channels.

The pull-based model is a deliberate architectural choice. It simplifies network security (targets don't need outbound access to a central server), makes configuration transparent, and scales well in distributed environments.

Prometheus Core Components

Prometheus is a modular ecosystem. Each component serves a specific role:

1. Prometheus Server

The central engine responsible for scraping targets, storing metrics in the TSDB, evaluating alerting rules, and serving the PromQL API and web UI.

2. Exporters

Exporters are lightweight agents or adapters that expose metrics in a Prometheus-compatible format. Key exporters include:

Node Exporter — Collects hardware and OS-level metrics: CPU, memory, disk I/O, network throughput, filesystem usage
Blackbox Exporter — Probes external endpoints over HTTP, HTTPS, DNS, TCP, and ICMP for availability and latency
Database Exporters — Dedicated exporters exist for PostgreSQL, MySQL, Redis, MongoDB, and many others
Application-Specific Exporters — Most modern applications and frameworks expose a /metrics endpoint natively

3. Alertmanager

Handles the full alerting lifecycle: receiving alerts from Prometheus, deduplicating and grouping them, applying silences and inhibition rules, and routing notifications to receivers such as Slack, email, PagerDuty, or OpsGenie.

4. PromQL (Prometheus Query Language)

A purpose-built functional query language for time-series data. PromQL supports instant vectors, range vectors, aggregation operators, mathematical functions, and subqueries — giving you deep analytical power over your metrics.

5. Pushgateway

Designed for ephemeral or batch jobs that cannot be scraped directly (e.g., a cron job that runs for 30 seconds). These jobs push their metrics to the Pushgateway, which Prometheus then scrapes on its regular interval.

6. Grafana

While not part of Prometheus itself, Grafana is the de facto visualization layer for Prometheus data. It connects to Prometheus as a data source and enables the creation of rich, interactive dashboards with panels, variables, and annotations.

Step-by-Step Prometheus Setup on Linux

The following guide walks you through a complete Prometheus deployment on a Linux server, including Node Exporter for system metrics, Grafana for visualization, and Alertmanager for notifications.

> Prerequisites: A Linux server (Ubuntu 20.04/22.04 or CentOS/RHEL 8+), sudo or root access, and basic familiarity with the command line. If you need a reliable server environment, consider AlexHost VPS Hosting for a performant, low-latency foundation.

Step 1: Install Prometheus

Download and extract the latest Prometheus release:

wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
tar -xvf prometheus-2.47.0.linux-amd64.tar.gz
cd prometheus-2.47.0.linux-amd64

> Tip: Always check the official Prometheus releases page for the latest stable version before downloading.

Create a dedicated system user and directory structure:

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

Create a systemd service for Prometheus:

sudo nano /etc/systemd/system/prometheus.service

Paste the following content:

[Unit]
Description=Prometheus Monitoring System
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus 
  --config.file=/etc/prometheus/prometheus.yml 
  --storage.tsdb.path=/var/lib/prometheus/ 
  --web.console.templates=/etc/prometheus/consoles 
  --web.console.libraries=/etc/prometheus/console_libraries 
  --storage.tsdb.retention.time=30d

[Install]
WantedBy=multi-user.target

Enable and start Prometheus:

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Prometheus will now be accessible at http://your-server-ip:9090.

Step 2: Configure Prometheus

The primary configuration file is /etc/prometheus/prometheus.yml. This file defines global settings and the scrape targets Prometheus monitors.

Basic configuration example:

global:
  scrape_interval: 15s        # Default scrape frequency
  evaluation_interval: 15s    # How often alerting rules are evaluated
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

rule_files:
  - "/etc/prometheus/rules/*.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

After modifying this file, always validate the configuration before restarting:

promtool check config /etc/prometheus/prometheus.yml
sudo systemctl restart prometheus

Step 3: Install Node Exporter for System Metrics

Node Exporter exposes detailed hardware and OS metrics from the host system — essential for monitoring CPU load, memory pressure, disk utilization, and network throughput.

Download and install Node Exporter:

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create a systemd service for Node Exporter:

sudo nano /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Node Exporter now exposes metrics at http://your-server-ip:9100/metrics.

Add Node Exporter as a scrape target in prometheus.yml:

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node_exporter"
    static_configs:
      - targets: ["localhost:9100"]

Restart Prometheus and verify the target appears as UP in the Prometheus UI under Status → Targets.

Step 4: Visualize Metrics with Grafana

Grafana transforms raw Prometheus metrics into actionable, visually rich dashboards. It is the standard visualization layer for Prometheus deployments.

Install Grafana on Ubuntu/Debian:

sudo apt-get install -y apt-transport-https software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Connect Prometheus as a data source:

Open Grafana at http://your-server-ip:3000 (default credentials: admin / admin)
Navigate to Configuration → Data Sources → Add data source
Select Prometheus
Set the URL to http://localhost:9090
Click Save & Test

Import a pre-built dashboard:

Go to Dashboards → Import
Enter dashboard ID 1860 (Node Exporter Full) from Grafana's marketplace
Select your Prometheus data source and click Import

You will immediately have a comprehensive view of CPU usage, memory consumption, disk I/O, network statistics, and system load — all in a single interactive dashboard.

Step 5: Configure Alerting Rules and Alertmanager

Prometheus alerting consists of two parts: alerting rules defined in Prometheus, and the Alertmanager that handles routing and delivery.

Create an alerting rules file:

sudo mkdir -p /etc/prometheus/rules
sudo nano /etc/prometheus/rules/system_alerts.yml

groups:
  - name: system_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage has exceeded 80% for more than 5 minutes. Current value: {{ $value }}%"

      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Available disk space on / is below 15%. Immediate action required."

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage has exceeded 85% for more than 5 minutes."

      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "Prometheus target {{ $labels.instance }} has been unreachable for more than 1 minute."

Validate the rules file:

promtool check rules /etc/prometheus/rules/system_alerts.yml

Install and configure Alertmanager:

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mkdir /etc/alertmanager

Create the Alertmanager configuration:

sudo nano /etc/alertmanager/alertmanager.yml

global:
  smtp_smarthost: 'smtp.yourdomain.com:587'
  smtp_from: 'alerts@yourdomain.com'
  smtp_auth_username: 'alerts@yourdomain.com'
  smtp_auth_password: 'your_password'

route:
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-notifications'

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'admin@yourdomain.com'
        send_resolved: true

> Note: For professional email delivery in your alerting pipeline, consider pairing Prometheus with AlexHost Email Hosting for reliable SMTP infrastructure.

Advanced Configuration: Service Discovery

For dynamic environments — such as Kubernetes clusters or auto-scaling server fleets — static target lists quickly become unmanageable. Prometheus supports multiple service discovery mechanisms out of the box:

scrape_configs:
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

Supported service discovery backends include Kubernetes, Consul, EC2, Azure, GCE, DNS-based discovery, and file-based discovery — making Prometheus adaptable to virtually any infrastructure topology.

Prometheus Use Cases

Prometheus excels across a broad range of monitoring scenarios:

Infrastructure Monitoring

Collect server-level metrics — CPU, memory, disk, network — from every node in your fleet. Whether you're running a single Dedicated Server or a large cluster, Prometheus provides unified visibility across all hosts.

Application Performance Monitoring (APM)

Track request rates, error rates, response latencies (the RED method), and resource consumption for web applications and APIs. Prometheus client libraries are available for Go, Python, Java, Ruby, Node.js, and more.

Kubernetes and Container Monitoring

Prometheus integrates natively with Kubernetes through the kube-state-metrics and cAdvisor exporters, providing deep visibility into pod health, resource quotas, deployment status, and cluster-level metrics.

Database Monitoring

Monitor query performance, connection pool utilization, replication lag, and cache hit ratios for databases like PostgreSQL, MySQL, and Redis using dedicated exporters.

Custom Business Metrics

Instrument your own applications to expose domain-specific metrics — such as orders processed per second, active user sessions, or payment transaction rates — enabling business-level observability alongside technical metrics.

GPU Workload Monitoring

For teams running machine learning or high-performance compute workloads, Prometheus can integrate with DCGM exporters to monitor GPU utilization, memory, and temperature. This pairs well with AlexHost GPU Hosting for AI and ML infrastructure.

Prometheus vs. Alternative Monitoring Solutions

Feature	Prometheus	Nagios	Zabbix	Datadog
Data Model	Time-series with labels	Check-based	Item-based	Time-series with tags
Collection Model	Pull (+ Pushgateway)	Active/Passive checks	Agent-based	Agent-based
Query Language	PromQL	None	Custom	Custom
Kubernetes Native	Yes (first-class)	Limited	Limited	Yes (paid)
Cost	Free / Open Source	Free / Open Source	Free / Open Source	Commercial SaaS
Scalability	High (with Thanos/Cortex)	Moderate	Moderate	High

Production Best Practices

Deploying Prometheus in production requires attention to several operational concerns:

Data Retention: The default retention period is 15 days. Adjust --storage.tsdb.retention.time based on your storage capacity and compliance requirements. For long-term storage, consider Thanos or Cortex.
Security: Restrict access to the Prometheus web UI and API using a reverse proxy (Nginx or Caddy) with authentication. Prometheus does not include built-in authentication.
High Availability: Run multiple Prometheus instances scraping the same targets for redundancy. Use Alertmanager's clustering feature to prevent duplicate notifications.
Cardinality Management: Avoid high-cardinality labels (e.g., user IDs, request IDs) in metric names, as they can cause memory and performance issues.
TLS Encryption: Enable TLS for scrape endpoints and the Prometheus API. Pair this with an SSL Certificate to secure all communications between Prometheus components.
Resource Planning: Prometheus is memory-intensive. Allocate sufficient RAM based on the number of active time series. A general rule is approximately 1–2 bytes per sample in memory.

Frequently Asked Questions

Q: What is the difference between Prometheus and Grafana?

Prometheus is the monitoring and alerting backend — it collects, stores, and queries metrics. Grafana is a visualization frontend that connects to Prometheus (and other data sources) to render dashboards. They are complementary tools, not alternatives.

Q: Can Prometheus monitor Windows servers?

Yes. The Windows Exporter (formerly WMI Exporter) exposes Windows system metrics in a Prometheus-compatible format, covering CPU, memory, disk, network, IIS, and more.

Q: How does Prometheus handle high availability?

Prometheus itself is designed to be run as a single instance per cluster. For HA, you run two identical Prometheus servers scraping the same targets. Alertmanager supports native clustering to deduplicate alerts across multiple Prometheus instances.

Q: What is PromQL used for?

PromQL (Prometheus Query Language) is used to query time-series data stored in Prometheus. It supports instant queries, range queries, aggregations, mathematical operations, and functions — enabling everything from simple metric lookups to complex anomaly detection expressions.

Conclusion

Prometheus is a battle-tested, production-grade monitoring solution that provides deep, real-time visibility into the health and performance of your entire infrastructure stack. Its pull-based architecture, multi-dimensional data model, powerful PromQL query language, and seamless integrations with tools like Grafana and Alertmanager make it the gold standard for modern observability.

Whether you are monitoring a single Linux server, a fleet of Dedicated Servers, a Kubernetes cluster, or a complex microservices application, Prometheus scales to meet your needs. By following the steps in this guide — installing Prometheus, deploying Node Exporter, configuring Grafana dashboards, and setting up intelligent alerting rules — you will have a robust monitoring foundation that helps you detect anomalies early, respond to incidents faster, and continuously improve system reliability.

Start with the basics, iterate on your dashboards and alert thresholds as you learn your system's normal behavior, and progressively expand coverage to every layer of your stack. Prometheus is not just a monitoring tool — it is a cornerstone of modern site reliability engineering.

Save 15% on All Hosting Services