Cron Job Monitoring Guide: Stop Silent Failures
Cron jobs fail silently. A missed backup, a skipped report, a broken sync — you won't know until something goes wrong downstream. Dead-man's-switch monitoring flips this: your job pings a service after every success, and silencetriggers the alert. This guide covers setup for Healthchecks.io, Cronitor, Dead Man's Snitch, and a custom solution.
The Silent Failure Problem
Cron fires your script, your script exits non-zero (or hangs, or never starts), and… nothing. No email. No alert. No log entry in a visible location. Production cron jobs in every engineering org fail silently every week. Monitoring is not optional for jobs that matter.
Contents
How Dead-Man's-Switch Monitoring Works
Unlike uptime monitoring (which pings your server from outside), dead-man's-switch monitoring requires your job to actively check in:
- 1Create a "check" in the monitoring service and get a unique ping URL.
- 2Set the expected schedule (e.g., "every day") and a grace period (e.g., "1 hour late is OK").
- 3At the end of your cron job (on success), make an HTTP GET to the ping URL.
- 4If no ping arrives within the grace period, the service sends an alert. Silence = problem.
# Linux crontab0 3 * * * /usr/local/bin/backup.sh && curl -fsS --retry 3 https://hc-ping.com/YOUR-UUIDHealthchecks.io
Healthchecks.iois open-source (self-hostable) with a generous free tier (20 checks). It's the most popular standalone cron monitoring service.
Setup (3 steps)
- 1Create a check at healthchecks.io → New Check → set schedule (cron expression or period) + grace period.
- 2Copy the ping URL — looks like
https://hc-ping.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - 3Ping on success — add a curl call at the end of your job.
Linux crontab
# Wrap your command: run job, then ping only on success
0 3 * * * /opt/scripts/nightly-backup.sh && \
curl -fsS --retry 3 https://hc-ping.com/YOUR-UUID > /dev/null
# Ping start + finish (shows duration in dashboard)
0 3 * * * \
curl -fsS https://hc-ping.com/YOUR-UUID/start && \
/opt/scripts/nightly-backup.sh && \
curl -fsS https://hc-ping.com/YOUR-UUIDNode.js
import cron from 'node-cron';
const HC_URL = process.env.HC_BACKUP_URL!; // https://hc-ping.com/YOUR-UUID
cron.schedule('0 3 * * *', async () => {
// Optional: signal start
await fetch(`${HC_URL}/start`).catch(() => {});
try {
await runNightlyBackup();
// Signal success
await fetch(HC_URL);
} catch (err) {
// Signal failure
await fetch(`${HC_URL}/fail`).catch(() => {});
logger.error({ err }, 'Backup failed');
}
});Python
import os
import requests
HC_URL = os.environ["HC_BACKUP_URL"] # https://hc-ping.com/YOUR-UUID
def run_backup():
requests.get(HC_URL + "/start", timeout=5)
try:
do_backup()
requests.get(HC_URL, timeout=5) # success
except Exception as e:
requests.get(HC_URL + "/fail", timeout=5) # failure
raiseCronitor
Cronitor offers a polished dashboard with job duration graphs, run history, and Slack/PagerDuty/SMS alerting. It has a CLI wrapper that makes integration trivial.
CLI Wrapper (easiest integration)
# Install Cronitor CLI
curl -sL https://cronitor.io/install | sudo bash
# Wrap your crontab entry — Cronitor does the rest
0 3 * * * cronitor exec nightly-backup /opt/scripts/nightly-backup.sh
# Or discover and wrap all existing cron jobs automatically
cronitor discoverNode.js SDK
npm install cronitor
import Cronitor from 'cronitor';
const cronitor = new Cronitor(process.env.CRONITOR_API_KEY!);
const monitor = cronitor.Monitor.put({
type: 'job',
key: 'nightly-backup',
schedule: '0 3 * * *',
notify: { alerts: ['default'] },
});
// Wrap your job
await monitor.run(async () => {
await runNightlyBackup();
});Dead Man's Snitch
Dead Man's Snitch is the simplest option — create a snitch, get a URL, ping it. No library needed, no start/fail variants, just a single GET request.
# crontab
0 3 * * * /opt/scripts/backup.sh && \
curl https://nosnch.in/YOUR_SNITCH_TOKEN
# Node.js (after successful job)
await fetch('https://nosnch.in/YOUR_SNITCH_TOKEN');Service Comparison
| Service | Free Tier | Alert Channels | Self-Host | Standout |
|---|---|---|---|---|
| Healthchecks.io | 20 checks | Email, Slack, PD, webhook | ✓ open-source | Open-source, generous free tier |
| Cronitor | 5 monitors | Email, Slack, PD, SMS | ✗ | CI/CD integration, dashboard |
| Dead Man's Snitch | 1 snitch | Email only (free) | ✗ | Simplest possible setup |
| Better Uptime | 10 heartbeats | Email, Slack, phone | ✗ | Combined uptime + cron monitoring |
| Custom webhook | Free (self-hosted) | Anything | ✓ | Full control, no vendor dependency |
Build Your Own (Free)
For internal tools or self-hosted setups, a simple Node.js heartbeat checker is 30 lines:
// heartbeat-store.ts (in-memory, or use Redis)
const lastPing: Record<string, Date> = {};
export function recordPing(jobId: string) {
lastPing[jobId] = new Date();
}
export function checkStale(jobId: string, maxAgeMs: number): boolean {
const last = lastPing[jobId];
if (!last) return true; // never pinged = stale
return Date.now() - last.getTime() > maxAgeMs;
}
// watchdog.ts — runs every hour
import cron from 'node-cron';
import { checkStale } from './heartbeat-store';
cron.schedule('0 * * * *', () => {
// Backup should have run in the last 25 hours
if (checkStale('nightly-backup', 25 * 60 * 60 * 1000)) {
sendAlert('Nightly backup has not run in 25+ hours!');
}
});What to Monitor & Alert On
Always monitor
- •Database backups
- •Financial reports / invoicing
- •Data sync jobs (ETL, API pulls)
- •User notification delivery
- •Certificate renewal jobs
Consider monitoring
- •Search index rebuilds
- •Cache warm-up jobs
- •Analytics aggregation
- •Email digest delivery
- •Health check sweeps
Alert thresholds
- •Grace period = 20–25% of job interval
- •Duration alert if job runs 3× longer than usual
- •Failure alert after 1 miss (critical jobs)
- •Failure alert after 2–3 misses (non-critical)
What not to over-alert on
- •Jobs that intentionally run only on business days
- •Jobs with known variance (DST, leap year)
- •Development/staging jobs with noisy schedules
Quick Recommendation
- →Healthchecks.io: Best overall — generous free tier, open-source, start/fail signals, Slack integration.
- →Cronitor: Best for teams that want a dashboard with duration history and CI/CD integration.
- →Dead Man's Snitch: Simplest possible setup for a single critical job — one URL, one curl.