MonitoringProductionBest Practices10 min read

Cron Job Monitoring Guide: Stop Silent Failures

Cron jobs fail silently. A missed backup, a skipped report, a broken sync — you won't know until something goes wrong downstream. Dead-man's-switch monitoring flips this: your job pings a service after every success, and silencetriggers the alert. This guide covers setup for Healthchecks.io, Cronitor, Dead Man's Snitch, and a custom solution.

The Silent Failure Problem

Cron fires your script, your script exits non-zero (or hangs, or never starts), and… nothing. No email. No alert. No log entry in a visible location. Production cron jobs in every engineering org fail silently every week. Monitoring is not optional for jobs that matter.

How Dead-Man's-Switch Monitoring Works

Unlike uptime monitoring (which pings your server from outside), dead-man's-switch monitoring requires your job to actively check in:

  1. 1Create a "check" in the monitoring service and get a unique ping URL.
  2. 2Set the expected schedule (e.g., "every day") and a grace period (e.g., "1 hour late is OK").
  3. 3At the end of your cron job (on success), make an HTTP GET to the ping URL.
  4. 4If no ping arrives within the grace period, the service sends an alert. Silence = problem.
Simple ping pattern
# Linux crontab
0 3 * * * /usr/local/bin/backup.sh && curl -fsS --retry 3 https://hc-ping.com/YOUR-UUID

Healthchecks.io

Healthchecks.iois open-source (self-hostable) with a generous free tier (20 checks). It's the most popular standalone cron monitoring service.

Setup (3 steps)

  1. 1
    Create a check at healthchecks.io → New Check → set schedule (cron expression or period) + grace period.
  2. 2
    Copy the ping URL — looks like https://hc-ping.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  3. 3
    Ping on success — add a curl call at the end of your job.

Linux crontab

# Wrap your command: run job, then ping only on success
0 3 * * * /opt/scripts/nightly-backup.sh && \
  curl -fsS --retry 3 https://hc-ping.com/YOUR-UUID > /dev/null

# Ping start + finish (shows duration in dashboard)
0 3 * * * \
  curl -fsS https://hc-ping.com/YOUR-UUID/start && \
  /opt/scripts/nightly-backup.sh && \
  curl -fsS https://hc-ping.com/YOUR-UUID

Node.js

import cron from 'node-cron';

const HC_URL = process.env.HC_BACKUP_URL!; // https://hc-ping.com/YOUR-UUID

cron.schedule('0 3 * * *', async () => {
  // Optional: signal start
  await fetch(`${HC_URL}/start`).catch(() => {});

  try {
    await runNightlyBackup();
    // Signal success
    await fetch(HC_URL);
  } catch (err) {
    // Signal failure
    await fetch(`${HC_URL}/fail`).catch(() => {});
    logger.error({ err }, 'Backup failed');
  }
});

Python

import os
import requests

HC_URL = os.environ["HC_BACKUP_URL"]  # https://hc-ping.com/YOUR-UUID

def run_backup():
    requests.get(HC_URL + "/start", timeout=5)
    try:
        do_backup()
        requests.get(HC_URL, timeout=5)       # success
    except Exception as e:
        requests.get(HC_URL + "/fail", timeout=5)  # failure
        raise

Cronitor

Cronitor offers a polished dashboard with job duration graphs, run history, and Slack/PagerDuty/SMS alerting. It has a CLI wrapper that makes integration trivial.

CLI Wrapper (easiest integration)

# Install Cronitor CLI
curl -sL https://cronitor.io/install | sudo bash

# Wrap your crontab entry — Cronitor does the rest
0 3 * * * cronitor exec nightly-backup /opt/scripts/nightly-backup.sh

# Or discover and wrap all existing cron jobs automatically
cronitor discover

Node.js SDK

npm install cronitor

import Cronitor from 'cronitor';

const cronitor = new Cronitor(process.env.CRONITOR_API_KEY!);

const monitor = cronitor.Monitor.put({
  type: 'job',
  key: 'nightly-backup',
  schedule: '0 3 * * *',
  notify: { alerts: ['default'] },
});

// Wrap your job
await monitor.run(async () => {
  await runNightlyBackup();
});

Dead Man's Snitch

Dead Man's Snitch is the simplest option — create a snitch, get a URL, ping it. No library needed, no start/fail variants, just a single GET request.

# crontab
0 3 * * * /opt/scripts/backup.sh && \
  curl https://nosnch.in/YOUR_SNITCH_TOKEN

# Node.js (after successful job)
await fetch('https://nosnch.in/YOUR_SNITCH_TOKEN');

Service Comparison

ServiceFree TierAlert ChannelsSelf-HostStandout
Healthchecks.io20 checksEmail, Slack, PD, webhook✓ open-sourceOpen-source, generous free tier
Cronitor5 monitorsEmail, Slack, PD, SMSCI/CD integration, dashboard
Dead Man's Snitch1 snitchEmail only (free)Simplest possible setup
Better Uptime10 heartbeatsEmail, Slack, phoneCombined uptime + cron monitoring
Custom webhookFree (self-hosted)AnythingFull control, no vendor dependency

Build Your Own (Free)

For internal tools or self-hosted setups, a simple Node.js heartbeat checker is 30 lines:

// heartbeat-store.ts (in-memory, or use Redis)
const lastPing: Record<string, Date> = {};

export function recordPing(jobId: string) {
  lastPing[jobId] = new Date();
}

export function checkStale(jobId: string, maxAgeMs: number): boolean {
  const last = lastPing[jobId];
  if (!last) return true; // never pinged = stale
  return Date.now() - last.getTime() > maxAgeMs;
}

// watchdog.ts — runs every hour
import cron from 'node-cron';
import { checkStale } from './heartbeat-store';

cron.schedule('0 * * * *', () => {
  // Backup should have run in the last 25 hours
  if (checkStale('nightly-backup', 25 * 60 * 60 * 1000)) {
    sendAlert('Nightly backup has not run in 25+ hours!');
  }
});

What to Monitor & Alert On

Always monitor

  • Database backups
  • Financial reports / invoicing
  • Data sync jobs (ETL, API pulls)
  • User notification delivery
  • Certificate renewal jobs

Consider monitoring

  • Search index rebuilds
  • Cache warm-up jobs
  • Analytics aggregation
  • Email digest delivery
  • Health check sweeps

Alert thresholds

  • Grace period = 20–25% of job interval
  • Duration alert if job runs 3× longer than usual
  • Failure alert after 1 miss (critical jobs)
  • Failure alert after 2–3 misses (non-critical)

What not to over-alert on

  • Jobs that intentionally run only on business days
  • Jobs with known variance (DST, leap year)
  • Development/staging jobs with noisy schedules

Quick Recommendation

  • Healthchecks.io: Best overall — generous free tier, open-source, start/fail signals, Slack integration.
  • Cronitor: Best for teams that want a dashboard with duration history and CI/CD integration.
  • Dead Man's Snitch: Simplest possible setup for a single critical job — one URL, one curl.