[Prometheus & Grafana] Chapter 5. Jobs and Instances

Note: This post is a summary based on the official Prometheus (v3.2.1) and Grafana documentation. For precise details, please refer to the official docs.

Prometheus Official Docs

Grafana Official Docs

Every metric Prometheus collects carries two labels that identify its origin: job and instance. These are not arbitrary tags -- they reflect Prometheus's fundamental model for organizing scrape targets. Understanding this model is the final piece of Part 02's data model coverage.

5.1 Instance: The Scrape Endpoint

An Instance is a single endpoint that Prometheus can scrape. It typically corresponds to one process and is identified by a <host>:<port> pair.

localhost:9090        <- Prometheus itself
10.0.1.5:9100        <- Node Exporter
10.0.1.5:4000        <- Web application

Each of these is an Instance. One host can run multiple Instances on different ports, and each is tracked independently.

5.2 Job: A Logical Group of Same-Purpose Instances

A Job is a logical group of replicated Instances that serve the same purpose. Running multiple copies of the same process for scalability or availability is standard practice -- a Job bundles them under one name.

# prometheus.yml
scrape_configs:
  - job_name: 'api-server'
    static_configs:
      - targets:
          - '10.0.1.5:5670'
          - '10.0.1.5:5671'
          - '10.0.2.5:5670'
          - '10.0.2.5:5671'

The api-server Job above contains four Instances. The tree structure makes the relationship clear.

Job: api-server
├── Instance: 10.0.1.5:5670
├── Instance: 10.0.1.5:5671
├── Instance: 10.0.2.5:5670
└── Instance: 10.0.2.5:5671

Job: node-exporter
├── Instance: 10.0.1.5:9100
└── Instance: 10.0.2.5:9100

A Job groups what is logically the same service. An Instance pinpoints exactly which process within that service produced a given metric.

5.3 Auto-Generated Labels: `job` and `instance`

Prometheus automatically attaches two labels to every scraped metric.

Label	Value	Example
`job`	`job_name` from the scrape config	`api-server`
`instance`	`<host>:<port>` of the scrape target	`10.0.1.5:5670`

Every collected metric is therefore traceable to its exact origin.

http_requests_total{job="api-server", instance="10.0.1.5:5670", method="GET"} = 1234
http_requests_total{job="api-server", instance="10.0.1.5:5671", method="GET"} = 5678

honor_labels

A conflict arises when a scrape target already exposes its own job or instance labels. The honor_labels setting resolves this.

honor_labels	Behavior
`false` (default)	Renames the target's labels to `exported_job`, `exported_instance`; uses Prometheus-assigned labels
`true`	Uses the target's labels as-is; Prometheus-assigned labels are discarded

Federation setups typically use honor_labels: true to preserve the original labels from upstream Prometheus servers.

5.4 Auto-Generated Metrics

Beyond labels, Prometheus generates several metrics per scrape target automatically. These are essential for monitoring the health of the monitoring system itself.

The `up` Metric

The most important auto-generated metric. It indicates whether a scrape succeeded.

Value	Meaning
`1`	Scrape successful -- instance is up
`0`	Scrape failed -- instance is down or unreachable

# Find all downed instances
up == 0

# Healthy instance ratio for a specific Job
avg(up{job="api-server"})

An alert rule on up == 0 is often the first alert any Prometheus deployment configures. If up is 0, everything else about that target is unknown.

Other Auto-Generated Metrics

Metric	Description
`scrape_duration_seconds`	Time taken to complete the scrape
`scrape_samples_scraped`	Number of samples collected
`scrape_samples_post_metric_relabeling`	Samples remaining after metric relabeling
`scrape_series_added`	New time series added in this scrape (v2.10+)

extra-scrape-metrics Feature Flag

Enabling --enable-feature=extra-scrape-metrics exposes additional scrape diagnostics.

Metric	Description
`scrape_timeout_seconds`	Configured scrape timeout
`scrape_sample_limit`	Configured sample limit (0 = unlimited)
`scrape_body_size_bytes`	Uncompressed size of the last scrape response

Practical PromQL

These auto-generated metrics become powerful when combined in queries.

# Targets where scraping takes over 3 seconds (timeout risk)
scrape_duration_seconds > 3

# Targets where sample count doubled compared to 1 hour ago (cardinality explosion suspect)
scrape_samples_scraped / scrape_samples_scraped offset 1h > 2

# Instance health summary by Job
count by (job) (up == 1)
count by (job) (up == 0)

The scrape_duration_seconds > 3 query is particularly useful. If a target consistently approaches the scrape timeout, it will eventually start failing -- catching it early prevents gaps in your data.

Part 02 Recap

This chapter concludes Part 02. The table below summarizes every concept covered across Chapter 3. Data Model, Chapter 4. Metric Types, and this chapter.

Concept	Definition	Key Point
Time Series	Time-ordered values identified by metric name + labels	Fundamental data unit
Metric Name	Describes what is measured	prefix + base unit + suffix convention
Labels	Key-value pairs for multi-dimensional distinction	Cardinality management is essential
Counter	Monotonically increasing cumulative value	Use with `rate()`, `_total` suffix
Gauge	Mutable snapshot value	Direct query, `predict_linear()`
Histogram	Bucket-based distribution	Server-side aggregation, `histogram_quantile()`
Summary	Client-side quantiles	Not aggregatable, precise quantiles
Job	Logical group of same-purpose instances	Auto `job` label
Instance	Single scrape endpoint	Auto `instance` label, `up` metric

Part 02 established the theoretical foundation -- what Prometheus stores and how it categorizes that data. Part 03 shifts to practice. The next chapter covers installation of Prometheus and Grafana.

[Prometheus & Grafana] Chapter 5. Jobs and Instances

5.1 Instance: The Scrape Endpoint

5.2 Job: A Logical Group of Same-Purpose Instances

5.3 Auto-Generated Labels: `job` and `instance`

honor_labels

5.4 Auto-Generated Metrics

The `up` Metric

Other Auto-Generated Metrics

extra-scrape-metrics Feature Flag

Practical PromQL

Part 02 Recap

Prometheus & Grafana(5 / 5)

Comments

Related posts

[Prometheus & Grafana] Chapter 4. Metric Types

[Prometheus & Grafana] Chapter 3. Data Model

5.1 Instance: The Scrape Endpoint

5.2 Job: A Logical Group of Same-Purpose Instances

5.3 Auto-Generated Labels: job and instance

honor_labels

5.4 Auto-Generated Metrics

The up Metric

Other Auto-Generated Metrics

extra-scrape-metrics Feature Flag

Practical PromQL

Part 02 Recap

Prometheus & Grafana(5 / 5)

Comments

Related posts

[Prometheus & Grafana] Chapter 4. Metric Types

[Prometheus & Grafana] Chapter 3. Data Model

5.3 Auto-Generated Labels: `job` and `instance`

The `up` Metric