Cron Job Best Practices: 12 Rules for Production Reliability
Most cron job failures are preventable. These 12 practices โ drawn from real production incidents โ will save you from the most common pitfalls: silent failures, runaway jobs, duplicate runs, and DST-related surprises.
Quick Checklist
Always schedule in UTC
Schedule all cron jobs in UTC and convert to local time only in your application logic. DST transitions can cause jobs to run twice or not at all if you schedule in local time. All major platforms (Linux cron, GitHub Actions, AWS Lambda) default to UTC.
# BAD: schedules at 9AM โ ambiguous during DST 0 9 * * * /opt/report.sh # GOOD: schedule in UTC, convert in code 0 14 * * * /opt/report.sh # 9 AM EST = 14:00 UTC
Make jobs idempotent
A cron job should produce the same result whether it runs once or multiple times. Platforms can deliver duplicate triggers, machines restart, and ops teams manually re-run jobs. Use upserts instead of inserts, track processed IDs, and check-before-act.
# BAD: double insert on re-run
INSERT INTO reports (date, total) VALUES ('2026-04-11', 1500);
# GOOD: upsert โ safe to re-run
INSERT INTO reports (date, total)
VALUES ('2026-04-11', 1500)
ON CONFLICT (date) DO UPDATE SET total = EXCLUDED.total;Redirect output to logs
By default, cron sends stdout/stderr to the crontab owner's local mail. Redirect output to a log file or structured logging system so you have audit trails and debugging data.
# Capture both stdout and stderr 0 2 * * * /opt/backup.sh >> /var/log/backup.log 2>&1 # Append with timestamp 0 2 * * * echo "$(date): start" >> /var/log/backup.log \ && /opt/backup.sh >> /var/log/backup.log 2>&1 \ && echo "$(date): done" >> /var/log/backup.log
Set a timeout
Runaway jobs can block system resources and prevent the next run from starting. Wrap your command in `timeout` to enforce a maximum execution time.
# Kill job if it runs > 5 minutes 0 * * * * timeout 300 /opt/sync.sh >> /var/log/sync.log 2>&1 # With exit code handling 0 * * * * timeout --kill-after=10 300 /opt/sync.sh || \ echo "$(date): TIMEOUT" >> /var/log/sync-errors.log
Use distributed locking for concurrent environments
If you run multiple servers with identical crontabs, every instance fires the same job simultaneously. Use a distributed lock (Redis SETNX, PostgreSQL advisory lock, or a purpose-built tool like Redlock) to elect one runner.
#!/bin/bash # Acquire Redis lock (expires after 5 min) LOCK=$(redis-cli SET job:daily-report 1 NX EX 300) if [ "$LOCK" = "OK" ]; then python /opt/daily_report.py redis-cli DEL job:daily-report else echo "$(date): Another instance is running, skipping." fi
Alert on failures โ don't just log them
Logs are silent. Wire up failure notifications to Slack, PagerDuty, or email so that a failing job surfaces immediately. Many teams use healthcheck.io or Cronitor to monitor that jobs run AND succeed.
#!/bin/bash
# Report failure to Slack webhook
run_job() {
python /opt/billing_sync.py
}
if ! run_job; then
curl -X POST $SLACK_WEBHOOK \
-H 'Content-type: application/json' \
--data '{"text":"โ billing_sync failed at '"$(date)"'"}'
exit 1
fiStagger jobs to avoid thundering herd
When all cron jobs fire at the top of the hour (0 * * * *), they compete for database connections, network bandwidth, and CPU simultaneously. Spread them out with offsets: 2 * * * *, 7 * * * *, 13 * * * *.
# BAD: three jobs all hammering DB at :00 0 * * * * /opt/job-a.sh 0 * * * * /opt/job-b.sh 0 * * * * /opt/job-c.sh # GOOD: staggered at 2-minute intervals 2 * * * * /opt/job-a.sh 7 * * * * /opt/job-b.sh 13 * * * * /opt/job-c.sh
Test with a faster schedule first
Before deploying a daily job, run it every minute in staging to verify logic, error handling, and idempotency. Once confident, switch to the production schedule.
# Staging: run every minute to test * * * * * /opt/daily-report.sh >> /tmp/report-test.log 2>&1 # Production: after testing, switch to daily 0 2 * * * /opt/daily-report.sh >> /var/log/report.log 2>&1
Document every cron job
Add a comment above each crontab entry explaining what it does, who owns it, and when it was last reviewed. Undocumented cron jobs become mysterious ghosts that no one dares delete.
# Daily billing sync โ Finance team โ reviewed 2026-01 # Syncs Stripe to internal DB. Safe to re-run. Timeout: 5m. 0 2 * * * timeout 300 /opt/billing-sync.sh >> /var/log/billing.log 2>&1 # Weekly cleanup โ Platform team โ reviewed 2026-02 # Deletes temp files older than 7 days from /tmp/uploads 0 3 * * 0 find /tmp/uploads -mtime +7 -delete
Handle job overlap explicitly
If a job takes longer than its interval, the next run starts while the previous one is still running. Decide whether to: (a) skip if already running, (b) queue and run sequentially, or (c) allow parallel runs. Use flock or lockfile for option (a).
# Skip if already running (flock) * * * * * flock -n /tmp/job.lock /opt/slow-job.sh \ >> /var/log/slow-job.log 2>&1 # Alternative: use run-one utility * * * * * run-one /opt/slow-job.sh
Keep cron jobs stateless and short
Cron jobs should do one thing and exit. Heavy processing should be handed off to a job queue (Celery, Sidekiq, BullMQ). The cron job enqueues the work; workers process it asynchronously.
# BAD: cron job does all the heavy lifting 0 2 * * * python /opt/process-10gb-file.py # GOOD: cron enqueues, worker processes 0 2 * * * python /opt/enqueue-nightly-job.py # Worker processes job from queue asynchronously
Use a managed scheduler in production
System cron is fragile โ jobs stop when the machine reboots, are hard to monitor, and don't scale. For production, consider managed schedulers: AWS EventBridge, GCP Cloud Scheduler, Kubernetes CronJob, or purpose-built tools like Temporal, Inngest, or Trigger.dev.
# Instead of /etc/crontab on a VM:
0 2 * * * /opt/billing-sync.sh
# Use AWS EventBridge Scheduler (managed, reliable):
aws scheduler create-schedule \
--name billing-sync \
--schedule-expression "cron(0 2 * * ? *)" \
--target '{"Arn":"arn:aws:lambda:...","RoleArn":"..."}'TL;DR โ Minimum Viable Cron Job
#!/bin/bash
# Owner: platform-team | Reviewed: 2026-04-11
# What: Daily billing sync โ idempotent, upserts only
# On fail: Slack alert via /opt/notify.sh
set -euo pipefail
LOCK=/tmp/billing-sync.lock
LOG=/var/log/billing-sync.log
# Distributed lock (skip if running)
exec 9>"$LOCK"
flock -n 9 || { echo "$(date): Already running, skipping."; exit 0; }
echo "$(date): Starting billing sync" >> "$LOG"
# Run with timeout
timeout 300 python /opt/billing_sync.py >> "$LOG" 2>&1 || {
echo "$(date): FAILED" >> "$LOG"
/opt/notify.sh "billing-sync failed"
exit 1
}
echo "$(date): Done" >> "$LOG"
# Crontab entry (UTC, staggered from other jobs)
# 7 2 * * * /opt/billing-sync.shValidate your cron expressions
Use our free tools to build, test, and debug cron expressions before they hit production.