Prometheus Monitoring System: Complete Setup Guide for Infrastructure & Application Monitoring
As modern infrastructure grows in complexity — spanning microservices, containers, and distributed systems — robust monitoring is no longer optional. It is a fundamental requirement for maintaining performance, reliability, and scalability. Prometheus, an open-source monitoring and alerting toolkit, has emerged as one of the most widely adopted solutions for collecting and visualizing time-series metrics across diverse environments.
Originally developed by SoundCloud and now a graduated Cloud Native Computing Foundation (CNCF) project, Prometheus is trusted by engineering teams worldwide. This comprehensive guide covers everything you need to know: what Prometheus is, how it works, its core components, and a complete step-by-step setup process — including Node Exporter, Grafana dashboards, and alerting configuration.
What Is Prometheus?
Prometheus is a powerful, open-source monitoring system designed to collect, store, and query time-series data — measurements or events tracked over time with high-resolution timestamps. It enables teams to visualize system health, analyze trends, and trigger intelligent alerts before small issues escalate into critical outages.
Key Features of Prometheus
| Feature | Description |
|---|---|
| Time-Series Data Model | Metrics are stored as time-stamped sequences, enabling trend analysis and historical comparisons |
| Pull-Based Monitoring | Prometheus actively scrapes metrics from target endpoints rather than waiting for systems to push data |
| PromQL | A flexible, expressive query language for filtering, aggregating, and analyzing metrics in real time |
| Alertmanager Integration | Define threshold-based rules and route notifications to email, Slack, PagerDuty, and more |
| Service Discovery | Automatically discovers and scrapes targets in dynamic environments such as Kubernetes clusters |
| Multi-Dimensional Data | Labels allow you to slice and dice metrics across dimensions like region, instance, or service name |
These capabilities make Prometheus an ideal choice for teams running workloads on VPS Hosting, bare-metal infrastructure, or containerized platforms.
How Prometheus Works
Prometheus follows a clean, well-defined architecture built around the collection and storage of time-series data. Understanding this architecture is essential before deploying it in production.
Core Workflow
- Metric Collection (Scraping): Prometheus periodically sends HTTP requests to configured target endpoints — known as *exporters* — to collect metrics. The scrape interval is fully configurable.
- Time-Series Storage: Collected metrics are persisted in Prometheus's built-in time-series database (TSDB). Each data point carries a Unix timestamp and a set of key-value labels for identification.
- Querying with PromQL: Engineers use PromQL to query stored metrics, generate graphs, build dashboards, or define alert conditions based on real-time and historical data.
- Alerting Pipeline: When a metric crosses a predefined threshold, Prometheus fires an alert to the Alertmanager, which deduplicates, groups, and routes notifications to the appropriate channels.
The pull-based model is a deliberate architectural choice. It simplifies network security (targets don't need outbound access to a central server), makes configuration transparent, and scales well in distributed environments.
Prometheus Core Components
Prometheus is a modular ecosystem. Each component serves a specific role:
1. Prometheus Server
The central engine responsible for scraping targets, storing metrics in the TSDB, evaluating alerting rules, and serving the PromQL API and web UI.
2. Exporters
Exporters are lightweight agents or adapters that expose metrics in a Prometheus-compatible format. Key exporters include:
- Node Exporter — Collects hardware and OS-level metrics: CPU, memory, disk I/O, network throughput, filesystem usage
- Blackbox Exporter — Probes external endpoints over HTTP, HTTPS, DNS, TCP, and ICMP for availability and latency
- Database Exporters — Dedicated exporters exist for PostgreSQL, MySQL, Redis, MongoDB, and many others
- Application-Specific Exporters — Most modern applications and frameworks expose a
/metricsendpoint natively
3. Alertmanager
Handles the full alerting lifecycle: receiving alerts from Prometheus, deduplicating and grouping them, applying silences and inhibition rules, and routing notifications to receivers such as Slack, email, PagerDuty, or OpsGenie.
4. PromQL (Prometheus Query Language)
A purpose-built functional query language for time-series data. PromQL supports instant vectors, range vectors, aggregation operators, mathematical functions, and subqueries — giving you deep analytical power over your metrics.
5. Pushgateway
Designed for ephemeral or batch jobs that cannot be scraped directly (e.g., a cron job that runs for 30 seconds). These jobs push their metrics to the Pushgateway, which Prometheus then scrapes on its regular interval.
6. Grafana
While not part of Prometheus itself, Grafana is the de facto visualization layer for Prometheus data. It connects to Prometheus as a data source and enables the creation of rich, interactive dashboards with panels, variables, and annotations.
Step-by-Step Prometheus Setup on Linux
The following guide walks you through a complete Prometheus deployment on a Linux server, including Node Exporter for system metrics, Grafana for visualization, and Alertmanager for notifications.
> Prerequisites: A Linux server (Ubuntu 20.04/22.04 or CentOS/RHEL 8+), sudo or root access, and basic familiarity with the command line. If you need a reliable server environment, consider AlexHost VPS Hosting for a performant, low-latency foundation.
Step 1: Install Prometheus
Download and extract the latest Prometheus release:
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
tar -xvf prometheus-2.47.0.linux-amd64.tar.gz
cd prometheus-2.47.0.linux-amd64> Tip: Always check the official Prometheus releases page for the latest stable version before downloading.
Create a dedicated system user and directory structure:
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheusCreate a systemd service for Prometheus:
sudo nano /etc/systemd/system/prometheus.servicePaste the following content:
[Unit]
Description=Prometheus Monitoring System
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus
--config.file=/etc/prometheus/prometheus.yml
--storage.tsdb.path=/var/lib/prometheus/
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--storage.tsdb.retention.time=30d
[Install]
WantedBy=multi-user.targetEnable and start Prometheus:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheusPrometheus will now be accessible at http://your-server-ip:9090.
Step 2: Configure Prometheus
The primary configuration file is /etc/prometheus/prometheus.yml. This file defines global settings and the scrape targets Prometheus monitors.
Basic configuration example:
global:
scrape_interval: 15s # Default scrape frequency
evaluation_interval: 15s # How often alerting rules are evaluated
scrape_timeout: 10s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "/etc/prometheus/rules/*.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]After modifying this file, always validate the configuration before restarting:
promtool check config /etc/prometheus/prometheus.yml
sudo systemctl restart prometheusStep 3: Install Node Exporter for System Metrics
Node Exporter exposes detailed hardware and OS metrics from the host system — essential for monitoring CPU load, memory pressure, disk utilization, and network throughput.
Download and install Node Exporter:
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporterCreate a systemd service for Node Exporter:
sudo nano /etc/systemd/system/node_exporter.service[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.targetsudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporterNode Exporter now exposes metrics at http://your-server-ip:9100/metrics.
Add Node Exporter as a scrape target in prometheus.yml:
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_exporter"
static_configs:
- targets: ["localhost:9100"]Restart Prometheus and verify the target appears as UP in the Prometheus UI under Status → Targets.
Step 4: Visualize Metrics with Grafana
Grafana transforms raw Prometheus metrics into actionable, visually rich dashboards. It is the standard visualization layer for Prometheus deployments.
Install Grafana on Ubuntu/Debian:
sudo apt-get install -y apt-transport-https software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-serverConnect Prometheus as a data source:
- Open Grafana at
http://your-server-ip:3000(default credentials:admin/admin) - Navigate to Configuration → Data Sources → Add data source
- Select Prometheus
- Set the URL to
http://localhost:9090 - Click Save & Test
Import a pre-built dashboard:
- Go to Dashboards → Import
- Enter dashboard ID 1860 (Node Exporter Full) from Grafana's marketplace
- Select your Prometheus data source and click Import
You will immediately have a comprehensive view of CPU usage, memory consumption, disk I/O, network statistics, and system load — all in a single interactive dashboard.
Step 5: Configure Alerting Rules and Alertmanager
Prometheus alerting consists of two parts: alerting rules defined in Prometheus, and the Alertmanager that handles routing and delivery.
Create an alerting rules file:
sudo mkdir -p /etc/prometheus/rules
sudo nano /etc/prometheus/rules/system_alerts.ymlgroups:
- name: system_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage has exceeded 80% for more than 5 minutes. Current value: {{ $value }}%"
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 10m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Available disk space on / is below 15%. Immediate action required."
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage has exceeded 85% for more than 5 minutes."
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "Prometheus target {{ $labels.instance }} has been unreachable for more than 1 minute."Validate the rules file:
promtool check rules /etc/prometheus/rules/system_alerts.ymlInstall and configure Alertmanager:
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mkdir /etc/alertmanagerCreate the Alertmanager configuration:
sudo nano /etc/alertmanager/alertmanager.ymlglobal:
smtp_smarthost: 'smtp.yourdomain.com:587'
smtp_from: 'alerts@yourdomain.com'
smtp_auth_username: 'alerts@yourdomain.com'
smtp_auth_password: 'your_password'
route:
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@yourdomain.com'
send_resolved: true> Note: For professional email delivery in your alerting pipeline, consider pairing Prometheus with AlexHost Email Hosting for reliable SMTP infrastructure.
Advanced Configuration: Service Discovery
For dynamic environments — such as Kubernetes clusters or auto-scaling server fleets — static target lists quickly become unmanageable. Prometheus supports multiple service discovery mechanisms out of the box:
scrape_configs:
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: trueSupported service discovery backends include Kubernetes, Consul, EC2, Azure, GCE, DNS-based discovery, and file-based discovery — making Prometheus adaptable to virtually any infrastructure topology.
Prometheus Use Cases
Prometheus excels across a broad range of monitoring scenarios:
Infrastructure Monitoring
Collect server-level metrics — CPU, memory, disk, network — from every node in your fleet. Whether you're running a single Dedicated Server or a large cluster, Prometheus provides unified visibility across all hosts.
Application Performance Monitoring (APM)
Track request rates, error rates, response latencies (the RED method), and resource consumption for web applications and APIs. Prometheus client libraries are available for Go, Python, Java, Ruby, Node.js, and more.
Kubernetes and Container Monitoring
Prometheus integrates natively with Kubernetes through the kube-state-metrics and cAdvisor exporters, providing deep visibility into pod health, resource quotas, deployment status, and cluster-level metrics.
Database Monitoring
Monitor query performance, connection pool utilization, replication lag, and cache hit ratios for databases like PostgreSQL, MySQL, and Redis using dedicated exporters.
Custom Business Metrics
Instrument your own applications to expose domain-specific metrics — such as orders processed per second, active user sessions, or payment transaction rates — enabling business-level observability alongside technical metrics.
GPU Workload Monitoring
For teams running machine learning or high-performance compute workloads, Prometheus can integrate with DCGM exporters to monitor GPU utilization, memory, and temperature. This pairs well with AlexHost GPU Hosting for AI and ML infrastructure.
Prometheus vs. Alternative Monitoring Solutions
| Feature | Prometheus | Nagios | Zabbix | Datadog |
|---|---|---|---|---|
| Data Model | Time-series with labels | Check-based | Item-based | Time-series with tags |
| Collection Model | Pull (+ Pushgateway) | Active/Passive checks | Agent-based | Agent-based |
| Query Language | PromQL | None | Custom | Custom |
| Kubernetes Native | Yes (first-class) | Limited | Limited | Yes (paid) |
| Cost | Free / Open Source | Free / Open Source | Free / Open Source | Commercial SaaS |
| Scalability | High (with Thanos/Cortex) | Moderate | Moderate | High |
Production Best Practices
Deploying Prometheus in production requires attention to several operational concerns:
- Data Retention: The default retention period is 15 days. Adjust
--storage.tsdb.retention.timebased on your storage capacity and compliance requirements. For long-term storage, consider Thanos or Cortex. - Security: Restrict access to the Prometheus web UI and API using a reverse proxy (Nginx or Caddy) with authentication. Prometheus does not include built-in authentication.
- High Availability: Run multiple Prometheus instances scraping the same targets for redundancy. Use Alertmanager's clustering feature to prevent duplicate notifications.
- Cardinality Management: Avoid high-cardinality labels (e.g., user IDs, request IDs) in metric names, as they can cause memory and performance issues.
- TLS Encryption: Enable TLS for scrape endpoints and the Prometheus API. Pair this with an SSL Certificate to secure all communications between Prometheus components.
- Resource Planning: Prometheus is memory-intensive. Allocate sufficient RAM based on the number of active time series. A general rule is approximately 1–2 bytes per sample in memory.
Frequently Asked Questions
Q: What is the difference between Prometheus and Grafana?
Prometheus is the monitoring and alerting backend — it collects, stores, and queries metrics. Grafana is a visualization frontend that connects to Prometheus (and other data sources) to render dashboards. They are complementary tools, not alternatives.
Q: Can Prometheus monitor Windows servers?
Yes. The Windows Exporter (formerly WMI Exporter) exposes Windows system metrics in a Prometheus-compatible format, covering CPU, memory, disk, network, IIS, and more.
Q: How does Prometheus handle high availability?
Prometheus itself is designed to be run as a single instance per cluster. For HA, you run two identical Prometheus servers scraping the same targets. Alertmanager supports native clustering to deduplicate alerts across multiple Prometheus instances.
Q: What is PromQL used for?
PromQL (Prometheus Query Language) is used to query time-series data stored in Prometheus. It supports instant queries, range queries, aggregations, mathematical operations, and functions — enabling everything from simple metric lookups to complex anomaly detection expressions.
Conclusion
Prometheus is a battle-tested, production-grade monitoring solution that provides deep, real-time visibility into the health and performance of your entire infrastructure stack. Its pull-based architecture, multi-dimensional data model, powerful PromQL query language, and seamless integrations with tools like Grafana and Alertmanager make it the gold standard for modern observability.
Whether you are monitoring a single Linux server, a fleet of Dedicated Servers, a Kubernetes cluster, or a complex microservices application, Prometheus scales to meet your needs. By following the steps in this guide — installing Prometheus, deploying Node Exporter, configuring Grafana dashboards, and setting up intelligent alerting rules — you will have a robust monitoring foundation that helps you detect anomalies early, respond to incidents faster, and continuously improve system reliability.
Start with the basics, iterate on your dashboards and alert thresholds as you learn your system's normal behavior, and progressively expand coverage to every layer of your stack. Prometheus is not just a monitoring tool — it is a cornerstone of modern site reliability engineering.
