Monitoring your Azure OpenAI usage

I want to continue the saga of “using Azure OpenAI in a professional way” by talking a bit about a practice that you should always act: monitoring the AI usage. As said in previous posts, using Azure OpenAI in a real world is not just sending prompts to a model, but you need to carefully handle requests and monitor the underlying deployments.

Azure OpenAI uses Azure Monitor under the hood and you should start checking the available metrics in order to have the AI usage under control.

The first thing that I personally regularly check in production deployments is the Overview pane were you have the Azure OpenAI Metrics Dashboard. This dashboard gives you important informations about HTTP RequestsTokens-Based UsagePTU Utilization, and Fine-tuning:

These dashboards are in my opinion the first point to check. Here you can have details on how your users are sending AI requests, you can discover peaks of usage during time and if you have deployments and models under heavy load.

By going into Metrics you can also analyze the Processed Prompt Tokens (total number of prompt tokens (input) processed on an OpenAI model):

and the Generated Completion Tokens (number of generated tokens (output) from an OpenAI model):

and also the Processed Inference Token (number of inference tokens processed by an OpenAI model. Calculated as prompt tokens (input) + generated tokens):

These are useful metrics that impact on your AI costs.

For more advanced monitoring, remember that all metrics are exportable with diagnostics setting in Azure Monitor. To do that, from your Azure OpenAI resource page, under Monitoring, select Diagnostic settings on the left pane. On the Diagnostic settings page, select Add diagnostic setting:

Here set a name for your diagnostic setting, then check Send to Log Analytics workspace and set your Azure account subscription and your Log Analytics workspace. Under Logs, select all Logs and under Metrics, select AllMetrics:

Now the metrics and log data for your Azure OpenAI resource is exported in your Log Analytics workspace and you can start using KQL to query your logs. The table to query are AzureDiagnostics and AzureMetrics.

Here is an example of a KQL query to inspect the Request and Response log from Azure OpenAI:

AzureDiagnostics
| where ResourceProvider contains "MICROSOFT.COGNITIVESERVICES"
| where Category contains "RequestResponse"

I hope that Microsoft will improve always more the Azure OpenAI injected signals, because a lot more can be done here (I personally don’t know for example why they have now removed the properties_s parameter from AzureDiagnostics records that was useful in the past to inspect request and response time from the AI model and also request and response length).

When using Azure OpenAI models, don’t forget to also activate a monitoring practice for your deployments, it helps a lot on controlling anomalies and costs.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.