Best PracticesProductionAdvanced11 min read

Cron Job Best Practices: 12 Rules for Production Reliability

Most cron job failures are preventable. These 12 practices — drawn from real production incidents — will save you from the most common pitfalls: silent failures, runaway jobs, duplicate runs, and DST-related surprises.

Quick Checklist

🕐1. Always schedule in UTC

🔁2. Make jobs idempotent

📋3. Redirect output to logs

⏱4. Set a timeout

🔒5. Use distributed locking for concurrent environments

🚨6. Alert on failures — don't just log them

⚡7. Stagger jobs to avoid thundering herd

🧪8. Test with a faster schedule first

📝9. Document every cron job

🔄10. Handle job overlap explicitly

🪶11. Keep cron jobs stateless and short

☁12. Use a managed scheduler in production

🕐

Rule #1Critical

Always schedule in UTC

Schedule all cron jobs in UTC and convert to local time only in your application logic. DST transitions can cause jobs to run twice or not at all if you schedule in local time. All major platforms (Linux cron, GitHub Actions, AWS Lambda) default to UTC.

# BAD: schedules at 9AM — ambiguous during DST
0 9 * * * /opt/report.sh

# GOOD: schedule in UTC, convert in code
0 14 * * * /opt/report.sh  # 9 AM EST = 14:00 UTC

🔁

Rule #2Critical

Make jobs idempotent

A cron job should produce the same result whether it runs once or multiple times. Platforms can deliver duplicate triggers, machines restart, and ops teams manually re-run jobs. Use upserts instead of inserts, track processed IDs, and check-before-act.

# BAD: double insert on re-run
INSERT INTO reports (date, total) VALUES ('2026-04-11', 1500);

# GOOD: upsert — safe to re-run
INSERT INTO reports (date, total)
VALUES ('2026-04-11', 1500)
ON CONFLICT (date) DO UPDATE SET total = EXCLUDED.total;

📋

Rule #3High

Redirect output to logs

By default, cron sends stdout/stderr to the crontab owner's local mail. Redirect output to a log file or structured logging system so you have audit trails and debugging data.

# Capture both stdout and stderr
0 2 * * * /opt/backup.sh >> /var/log/backup.log 2>&1

# Append with timestamp
0 2 * * * echo "$(date): start" >> /var/log/backup.log \
  && /opt/backup.sh >> /var/log/backup.log 2>&1 \
  && echo "$(date): done" >> /var/log/backup.log

⏱

Rule #4High

Set a timeout

Runaway jobs can block system resources and prevent the next run from starting. Wrap your command in `timeout` to enforce a maximum execution time.

# Kill job if it runs > 5 minutes
0 * * * * timeout 300 /opt/sync.sh >> /var/log/sync.log 2>&1

# With exit code handling
0 * * * * timeout --kill-after=10 300 /opt/sync.sh || \
  echo "$(date): TIMEOUT" >> /var/log/sync-errors.log

🔒

Rule #5High

Use distributed locking for concurrent environments

If you run multiple servers with identical crontabs, every instance fires the same job simultaneously. Use a distributed lock (Redis SETNX, PostgreSQL advisory lock, or a purpose-built tool like Redlock) to elect one runner.

#!/bin/bash
# Acquire Redis lock (expires after 5 min)
LOCK=$(redis-cli SET job:daily-report 1 NX EX 300)

if [ "$LOCK" = "OK" ]; then
  python /opt/daily_report.py
  redis-cli DEL job:daily-report
else
  echo "$(date): Another instance is running, skipping."
fi

🚨

Rule #6High

Alert on failures — don't just log them

Logs are silent. Wire up failure notifications to Slack, PagerDuty, or email so that a failing job surfaces immediately. Many teams use healthcheck.io or Cronitor to monitor that jobs run AND succeed.

#!/bin/bash
# Report failure to Slack webhook
run_job() {
  python /opt/billing_sync.py
}

if ! run_job; then
  curl -X POST $SLACK_WEBHOOK \
    -H 'Content-type: application/json' \
    --data '{"text":"❌ billing_sync failed at '"$(date)"'"}'
  exit 1
fi

⚡

Rule #7Medium

Stagger jobs to avoid thundering herd

When all cron jobs fire at the top of the hour (0 * * * *), they compete for database connections, network bandwidth, and CPU simultaneously. Spread them out with offsets: 2 * * * *, 7 * * * *, 13 * * * *.

# BAD: three jobs all hammering DB at :00
0 * * * *  /opt/job-a.sh
0 * * * *  /opt/job-b.sh
0 * * * *  /opt/job-c.sh

# GOOD: staggered at 2-minute intervals
2 * * * *  /opt/job-a.sh
7 * * * *  /opt/job-b.sh
13 * * * * /opt/job-c.sh

🧪

Rule #8Medium

Test with a faster schedule first

Before deploying a daily job, run it every minute in staging to verify logic, error handling, and idempotency. Once confident, switch to the production schedule.

# Staging: run every minute to test
* * * * * /opt/daily-report.sh >> /tmp/report-test.log 2>&1

# Production: after testing, switch to daily
0 2 * * * /opt/daily-report.sh >> /var/log/report.log 2>&1

📝

Rule #9Medium

Document every cron job

Add a comment above each crontab entry explaining what it does, who owns it, and when it was last reviewed. Undocumented cron jobs become mysterious ghosts that no one dares delete.

# Daily billing sync — Finance team — reviewed 2026-01
# Syncs Stripe to internal DB. Safe to re-run. Timeout: 5m.
0 2 * * * timeout 300 /opt/billing-sync.sh >> /var/log/billing.log 2>&1

# Weekly cleanup — Platform team — reviewed 2026-02
# Deletes temp files older than 7 days from /tmp/uploads
0 3 * * 0 find /tmp/uploads -mtime +7 -delete

🔄

Rule #10Medium

Handle job overlap explicitly

If a job takes longer than its interval, the next run starts while the previous one is still running. Decide whether to: (a) skip if already running, (b) queue and run sequentially, or (c) allow parallel runs. Use flock or lockfile for option (a).

# Skip if already running (flock)
* * * * * flock -n /tmp/job.lock /opt/slow-job.sh \
  >> /var/log/slow-job.log 2>&1

# Alternative: use run-one utility
* * * * * run-one /opt/slow-job.sh

🪶

Rule #11Medium

Keep cron jobs stateless and short

Cron jobs should do one thing and exit. Heavy processing should be handed off to a job queue (Celery, Sidekiq, BullMQ). The cron job enqueues the work; workers process it asynchronously.

# BAD: cron job does all the heavy lifting
0 2 * * * python /opt/process-10gb-file.py

# GOOD: cron enqueues, worker processes
0 2 * * * python /opt/enqueue-nightly-job.py
# Worker processes job from queue asynchronously

☁

Rule #12Advisory

Use a managed scheduler in production

System cron is fragile — jobs stop when the machine reboots, are hard to monitor, and don't scale. For production, consider managed schedulers: AWS EventBridge, GCP Cloud Scheduler, Kubernetes CronJob, or purpose-built tools like Temporal, Inngest, or Trigger.dev.

# Instead of /etc/crontab on a VM:
0 2 * * * /opt/billing-sync.sh

# Use AWS EventBridge Scheduler (managed, reliable):
aws scheduler create-schedule \
  --name billing-sync \
  --schedule-expression "cron(0 2 * * ? *)" \
  --target '{"Arn":"arn:aws:lambda:...","RoleArn":"..."}'

TL;DR — Minimum Viable Cron Job

#!/bin/bash
# Owner: platform-team | Reviewed: 2026-04-11
# What: Daily billing sync — idempotent, upserts only
# On fail: Slack alert via /opt/notify.sh

set -euo pipefail

LOCK=/tmp/billing-sync.lock
LOG=/var/log/billing-sync.log

# Distributed lock (skip if running)
exec 9>"$LOCK"
flock -n 9 || { echo "$(date): Already running, skipping."; exit 0; }

echo "$(date): Starting billing sync" >> "$LOG"

# Run with timeout
timeout 300 python /opt/billing_sync.py >> "$LOG" 2>&1 || {
  echo "$(date): FAILED" >> "$LOG"
  /opt/notify.sh "billing-sync failed"
  exit 1
}

echo "$(date): Done" >> "$LOG"

# Crontab entry (UTC, staggered from other jobs)
# 7 2 * * * /opt/billing-sync.sh

Validate your cron expressions

Use our free tools to build, test, and debug cron expressions before they hit production.

Test Cron Expression Visual Builder More Guides