> databricks-debug-bundle
Collect Databricks debug evidence for support tickets and troubleshooting. Use when encountering persistent issues, preparing support tickets, or collecting diagnostic information for Databricks problems. Trigger with phrases like "databricks debug", "databricks support bundle", "collect databricks logs", "databricks diagnostic".
curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/databricks-debug-bundle?format=md"Databricks Debug Bundle
Current State
!databricks --version 2>/dev/null || echo 'CLI not installed'
!python3 -c "import databricks.sdk; print(f'SDK {databricks.sdk.__version__}')" 2>/dev/null || echo 'SDK not installed'
Overview
Collect all diagnostic information needed for Databricks support tickets: environment info, cluster state, cluster events, job run details, Spark driver logs, and Delta table history. Produces a redacted tar.gz bundle safe to share with support.
Prerequisites
- Databricks CLI installed and configured
- Access to cluster logs (admin or cluster owner)
- Permission to access job run details
Instructions
Step 1: Create Debug Collection Script
#!/bin/bash
set -euo pipefail
# databricks-debug-bundle.sh [cluster_id] [run_id] [table_name]
BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
CLUSTER_ID="${1:-}"
RUN_ID="${2:-}"
TABLE_NAME="${3:-}"
echo "=== Databricks Debug Bundle ===" | tee "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE_DIR/summary.txt"
echo "Workspace: ${DATABRICKS_HOST:-unset}" >> "$BUNDLE_DIR/summary.txt"
Step 2: Collect Environment Info
{
echo ""
echo "--- Environment ---"
echo "CLI: $(databricks --version 2>&1)"
echo "SDK: $(pip show databricks-sdk 2>/dev/null | grep Version || echo 'not installed')"
echo "Python: $(python3 --version 2>&1)"
echo "OS: $(uname -srm)"
echo ""
echo "--- Current User ---"
databricks current-user me --output json 2>&1 | jq '{userName, active}' || echo "Auth failed"
} >> "$BUNDLE_DIR/summary.txt"
Step 3: Collect Cluster Information
if [ -n "$CLUSTER_ID" ]; then
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Cluster: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt"
# Full cluster config
databricks clusters get --cluster-id "$CLUSTER_ID" --output json \
> "$BUNDLE_DIR/cluster_config.json" 2>&1
# Key fields summary
jq '{state, spark_version, node_type_id, num_workers,
autotermination_minutes, termination_reason}' \
"$BUNDLE_DIR/cluster_config.json" >> "$BUNDLE_DIR/summary.txt"
# Recent cluster events (state changes, errors, resizing)
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 30 --output json \
> "$BUNDLE_DIR/cluster_events.json" 2>&1
# Extract event timeline
jq -r '.events[]? | "\(.timestamp): \(.type) — \(.details // "no details")"' \
"$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi
Step 4: Collect Job Run Information
if [ -n "$RUN_ID" ]; then
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Run: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt"
# Full run details
databricks runs get --run-id "$RUN_ID" --output json \
> "$BUNDLE_DIR/run_details.json" 2>&1
# Run state summary
jq '{state: .state, start_time, end_time, run_duration}' \
"$BUNDLE_DIR/run_details.json" >> "$BUNDLE_DIR/summary.txt"
# Task-level breakdown
jq -r '.tasks[]? | " Task \(.task_key): \(.state.result_state // "RUNNING") — \(.state.state_message // "ok")"' \
"$BUNDLE_DIR/run_details.json" >> "$BUNDLE_DIR/summary.txt"
# Run output (error messages, stdout)
databricks runs get-output --run-id "$RUN_ID" --output json \
> "$BUNDLE_DIR/run_output.json" 2>&1
jq '{error, error_trace: (.error_trace // "" | .[0:2000])}' \
"$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi
Step 5: Collect Spark Driver Logs
if [ -n "$CLUSTER_ID" ]; then
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Spark Driver Logs (last 500 lines) ---" >> "$BUNDLE_DIR/summary.txt"
python3 << 'PYEOF' > "$BUNDLE_DIR/driver_logs.txt" 2>&1
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
try:
content = w.dbfs.read("/cluster-logs/${CLUSTER_ID}/driver/log4j-active.log")
# Take last 500 lines
lines = content.data.decode().splitlines()[-500:]
print("\n".join(lines))
except Exception as e:
print(f"Could not fetch driver logs: {e}")
print("Tip: Enable cluster log delivery in cluster config for persistent logs")
PYEOF
fi
Step 6: Collect Delta Table Diagnostics
if [ -n "$TABLE_NAME" ]; then
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Delta Table: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"
python3 << PYEOF > "$BUNDLE_DIR/delta_diagnostics.txt" 2>&1
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
print("=== Table Details ===")
spark.sql("DESCRIBE DETAIL ${TABLE_NAME}").show(truncate=False)
print("\n=== Recent History (last 20 operations) ===")
spark.sql("DESCRIBE HISTORY ${TABLE_NAME} LIMIT 20").show(truncate=False)
print("\n=== Schema ===")
spark.sql("DESCRIBE ${TABLE_NAME}").show(truncate=False)
print("\n=== File Stats ===")
detail = spark.sql("DESCRIBE DETAIL ${TABLE_NAME}").first()
print(f"Files: {detail.numFiles}, Size: {detail.sizeInBytes / 1024 / 1024:.1f} MB")
PYEOF
fi
Step 7: Package Bundle (Redacted)
# Redact sensitive data from config snapshot
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt"
if [ -f ~/.databrickscfg ]; then
sed 's/token = .*/token = ***REDACTED***/' \
~/.databrickscfg > "$BUNDLE_DIR/config-redacted.txt"
sed -i 's/client_secret = .*/client_secret = ***REDACTED***/' \
"$BUNDLE_DIR/config-redacted.txt"
fi
# Network connectivity test
echo "--- Network ---" >> "$BUNDLE_DIR/summary.txt"
echo -n "API reachable: " >> "$BUNDLE_DIR/summary.txt"
curl -s -o /dev/null -w "%{http_code}" \
"${DATABRICKS_HOST}/api/2.0/clusters/list" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
# Create archive
tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR"
rm -rf "$BUNDLE_DIR"
echo ""
echo "Bundle created: $BUNDLE_DIR.tar.gz"
echo "Contents: summary.txt, cluster_config.json, cluster_events.json,"
echo " run_details.json, run_output.json, driver_logs.txt,"
echo " delta_diagnostics.txt, config-redacted.txt"
Output
databricks-debug-YYYYMMDD-HHMMSS.tar.gzcontaining:summary.txt— Human-readable diagnostic summarycluster_config.json— Full cluster configurationcluster_events.json— State changes, errors, resizing eventsrun_details.json— Job run with task-level breakdownrun_output.json— Stdout/stderr and error tracesdriver_logs.txt— Last 500 lines of Spark driver logdelta_diagnostics.txt— Table details, history, schemaconfig-redacted.txt— CLI config with secrets removed
Error Handling
| Item | Included | Notes |
|---|---|---|
| Tokens/secrets | NEVER | Redacted with ***REDACTED*** |
| PII in logs | Review before sharing | Scan driver_logs.txt manually |
| Cluster IDs | Yes | Safe to share with support |
| Error traces | Yes | Check for embedded connection strings |
Examples
Usage
# Environment only
bash databricks-debug-bundle.sh
# With cluster diagnostics
bash databricks-debug-bundle.sh 0123-456789-abcde
# With cluster + job run
bash databricks-debug-bundle.sh 0123-456789-abcde 12345
# Full diagnostics including Delta table
bash databricks-debug-bundle.sh 0123-456789-abcde 12345 catalog.schema.table
Submit to Support
- Generate bundle:
bash databricks-debug-bundle.sh [args] - Review
summary.txtfor sensitive data - Open ticket at help.databricks.com
- Attach the
.tar.gzbundle - Include workspace ID (found in workspace URL:
adb-<workspace-id>)
Resources
Next Steps
For rate limit issues, see databricks-rate-limits.
> related_skills --same-repo
> fathom-cost-tuning
Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".
> fathom-core-workflow-b
Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".
> fathom-core-workflow-a
Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".
> fathom-common-errors
Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".