How to Monitor Azure Databricks in an Azure Log Analytics Workspace

AZURE

Azure Databricks lets you spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure. And of course, for any production-level solution, monitoring is a critical aspect.

Azure Databricks comes with robust monitoring capabilities for custom application metrics, streaming query events, and application log messages. It allows you to push this monitoring data to different logging services.

In this article, we will look at the setup required to send application logs and metrics from Microsoft Azure Databricks to a Log Analytics workspace.

Prerequisites
  1. Clone the repository mentioned below
    https://github.com/mspnp/spark-monitoring.git
  2. Azure Databricks workspace
  3. Azure Databricks CLI
    Databricks workspace personal access token is required to use the CLI
    You can also use the Databricks CLI from Azure Cloud Shell.
  4. Java IDEs with the following resources
    Java Development Kit (JDK) version 1.8
    Scala language SDK 2.11
    Apache Maven 3.5.4
Building the Azure Databricks monitoring library with Docker

After cloning repository please open the terminal in the respective path

Please run the command as follows
Windows :

docker run -it --rm -v %cd%/spark-monitoring:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh 

Linux:

chmod +x spark-monitoring/build.sh
docker run -it --rm -v `pwd`/spark-monitoring:/spark-monitoring -v "$HOME/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh 
Configuring Databricks workspace

dbfs configure –token
It will ask for Databricks workspace URL and Token
Use the personal access token that was generated when setting up the prerequisites
You can get the URL from
Azure portal > Databricks service > Overview

 dbfs mkdirs dbfs:/databricks/spark-monitoring 

Open the file /src/spark-listeners/scripts/spark-monitoring.sh
Now add the Log Analytics  Workspace ID and Key

Use Databricks CLI to copy the modified script

dbfs cp <local path to spark-monitoring.sh> dbfs:/databricks/spark-monitoring/spark-monitoring.sh 

Use Databricks CLI to copy all JAR files generated

dbfs cp --overwrite --recursive <local path to target folder> dbfs:/databricks/spark-monitoring/ 
Create and configure the Azure Databricks cluster
  1. Navigate to your Azure Databricks workspace in the Azure Portal.
  2. On the home page, click on "new cluster".
  3. Choose a name for your cluster and enter it in the text box titled "cluster name".
  4. In the "Databricks Runtime Version" dropdown, select 5.0 or later (includes Apache Spark 2.4.0, Scala 2.11).

5 Under "Advanced Options", click on the "Init Scripts" tab. Go to the last line under the
"Init Scripts section" and select "DBFS" under the "destination" dropdown. Enter
"dbfs:/databricks/spark-monitoring/spark-monitoring.sh" in the text box. Click the
"Add" button.

6 Click the "create cluster" button to create the cluster. Next, click on the "start" button to start the cluster.

Now you can run the jobs in the cluster and can get the logs in the Log Analytics workspace

We hope this article helps you set up the right configurations to send application logs and metrics from Azure Databricks to your Log Analytics workspace.

Share this:

Take a look at the lastest aricles

In today's fast-paced enterprise world, the pressure is on to create workflows that are not just efficient, but truly intelligent and scalable. Gone are the days when clunky, form-based interfaces could keep up. They were rigid, often frustrating for users, and crucially, lacked the smarts needed to drive real productivity. But what if your forms […]

Are outdated HR processes holding your enterprise back? In today's hyper-competitive landscape, the efficiency of your human resources directly impacts your bottom line, employee satisfaction, and ability to attract top talent. Yet, many organizations are still grappling with manual, resource-intensive tasks that drain productivity and stifle growth. Imagine a world where: Crafting compelling job descriptions […]

In today's hyper-competitive digital landscape, delivering an exceptional user experience (UX) isn't just a nice-to-have – it's the bedrock of customer loyalty and business growth. But as customer behaviors constantly evolve and applications grow increasingly complex, a critical question emerges: How can organizations consistently measure, monitor, and elevate the user experience at scale, and in […]

Let’s shape your AI-powered future together.

Partner with CloudIQ to achieve immediate gains while building a strong foundation for long-term, transformative success.