Some days ago I was with a partner for checking a Dynamics 365 Business Central project. This project has a big usage of Job Queue, with lots of tasks scheduled (business tasks and integration tasks).
When I see such type of scenario I always want to analyze why there’s the need to schedule all these things and (expecially) if all these tasks should really need to be performed by the Dynamics 365 Business Central job queue or not. My first personal rule is: is your task an integration task performing HTTP calls to external applications? If so, please remove it from the Job Queue and use other timer triggered tasks instead (Logic Apps is better suited for that for example).
For this project, we’ve moved some of these tasks externally and we had a general optimization of the processes (and also on the performance impact on the tenant itself).
The partner then asked how can then be notified when a Job Queue task fails. For this scope, recently Microsoft (thanks Kennie) has shared a KUSTO query that you can use in your telemetry store (Application Insights) in order to discover if job queue tasks are failed in a given period.
The KQL query is defined as follows:
traces | where timestamp > ago(1d) // adjust as needed | where customDimensions.eventId == 'AL0000E26' | project timestamp , aadTenantId = customDimensions.aadTenantId , environmentName = customDimensions.environmentName , environmentType = customDimensions.environmentType , alJobQueueId = customDimensions.alJobQueueId , alJobQueueObjectId = customDimensions.alJobQueueObjectId , alJobQueueObjectType = customDimensions.alJobQueueObjectType , alJobQueueStatus = customDimensions.alJobQueueStatus , alJobQueueExecutionTimeInMs = customDimensions.alJobQueueExecutionTimeInMs , alJobQueueResult = customDimensions.alJobQueueResult | where alJobQueueResult !in( '', 'Success' )
UPDATE 01/09/2022: The above KQL query must be changed now. New query:
traces | where timestamp > ago(60d) // adjust as needed and customDimensions.eventId == 'AL0000HE7' | project timestamp , aadTenantId = customDimensions.aadTenantId , environmentName = customDimensions.environmentName , environmentType = customDimensions.environmentType , alJobQueueId = customDimensions.alJobQueueId , alJobQueueObjectId = customDimensions.alJobQueueObjectId , alJobQueueObjectType = customDimensions.alJobQueueObjectType , alJobQueueStatus = customDimensions.alJobQueueStatus , alJobQueueExecutionTimeInMs = customDimensions.alJobQueueExecutionTimeInMs , alJobQueueStacktrace = customDimensions.alJobQueueStacktrace // stack trace added in 20.5
This query is also useful because you can setup an alert with a KQL query condition and be alerted when a job queue failure happens.
You can directly create an alert rule from Application Insights logs:
When you click on New alert rule, the Search query tab opens pre-populated with your log query. By default, the rule counts the number of results in the last 5 minutes but you can change it accordingly to your needs.
In the Measurement section, you need to select how to summarize the results. Log alerts can measure two different things, which can be used for different monitoring scenarios:
- Table rows: The number of rows returned can be used to work with events such as Windows event logs, syslog, application exceptions.
- Calculation of a numeric column: Calculations based on any numeric column can be used to include any number of resources. For example, CPU percentage.
Here the goal is to send an alert if we have a failed job queue task in a given period (for example 1 hour), so the alert rule can be created as follows:
In the Actions tab, select or create the required action groups. They are used to notify users about the alert and take an action. An action group is a collection of notification preferences that are defined by the owner of an Azure subscription. Here for example I created an action group that sends mail notifications and push notifications on the Azure Mobile app:
When you configure an action to notify a person by email or SMS, they receive a confirmation indicating that they have been added to the action group. Please check that your email filtering is configured appropriately. Emails are sent from the following email addresses:
To enable push notifications to the Azure mobile app, provide the email address that you use as your account ID when you configure the Azure mobile app.
In the Azure Mobile app, please check to have push notifications active (sorry for the IT localization of the mobile app here):
When all is ready, you will receive job queue failures alerts via email and also as a push notification on your phone. You can see the details of all the notifications also from the Azure Mobile app:
If you want to be proactive and help your customers on react to problems, I suggest to start using alerts.
I’ve set this up as a test and triggered a job queue to fail – however Application Insights doesn’t seem to register a job queue as Failed with the KQL query and therefore no notification is sent.
How can we make sure that a failed job queue will be registered on Application Insights?
This sounds strange. If you schedule a codeunit via job queue and the codeunit fails, this should be logged. You can check the signals where:
| where customDimensions.eventId in (“AL0000E26”) and customDimensions.alJobQueueResult == “Fail”
Issue for me is that the telemetry does not contain the dimension alJobQueueResult.
What environment version do you have?
Is this what you looking for? Or where do i find it.
Hi Demiliani, thanks for blogging, very usefull as usual. Same issue here Version: W1 19.2 (Platform 19.0.32956.33475 + Application 19.2.32968.33504) OnPrem no JobQueue ingestion at all, only task scheduler. Ms Docs states this version should ingest Job Queue telemetry. Any thoughts?
Sorry but this sounds strange. If you execute the following query, do you have some results?
| where customDimensions.eventId in (“AL0000E26”)
Hi, thanks for reaching out. I have no results on -> “traces
| where customDimensions.eventId in (“AL0000E26”)” and we heavily use Job Queues. I would like to mention we have three instances one for win, one for nup and one for acs authentication. They all have the App Insights connection string.
But are you talking about on-premise?
yes. BC 100% on-premises
yes. BC 100% on-premises
alJobQueueResult has gone :
Yes, the KQL query is different now. I update the post, thanks.
Many thanks for sharing and updating this article, it is very interesting.
One question: is it possible to make reference to at least the company where the error is happening?
I’m thinking on databases that have more than 5+ companies, if the user receives an email saying that one of the job queues is in error, he’ll have to review company by company.
Do you think it would it be possible to add information of the offending company and even the job queue name with Application insights?