Troubleshooting Quartz
If quartz stops processing jobs from the quartz queue, all scheduled jobs are listed in the NETWORK_SCHEDULE_POLICIES table. Ideally, there should be only 100 or so records/jobs scheduled in NETWORK_SCHEDULE_POLICIES.
Quartz lop entries can be viewed in relationals.log on the server where quartz is enabled. The log entries will look like this, where the worker threads are the ones that pick up and process jobs in the queue:
<timestamp> MasterScheduler ... <timestamp> [PlatForm_Worker-1] ... <timestamp> [PlatForm_Worker-2] ... <timestamp> RelationalsSchedulerWorker-1 ... <timestamp> RelationalsSchedulerWorker-2 ... ...
Here are some possible ways that quartz could get stuck:
- There are too many jobs/records in NETWORK_SCHEDULE_POLICIES
- Numbers in the 1,000's may cause quartz to become bogged down. You might want to determine what caused these many jobs to be inserted in the queue. If it is safe to delete them, they can be removed from the queue. Or they could be copied to a temporary table re-inserted later, at a time when there is no load on the server.
- Quartz might run into an error that causes the quartz thread to die.
- In that case, restarting the application server will bring the quartz thread back to life.
Tip: Set up a cron job that gets count of entries in the NETWORK_SCHEDULE_POLICIES. If the count hasn't changed after 20 mins or so, it usually means quartz is not picking up the jobs. (Alternatively use the REST APIs to create a task that monitors the status of that table.)