> azure-hdinsight

Expert knowledge for Azure HDInsight development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building, debugging, or optimizing Azure HDInsight applications. Not for Azure Databricks (use azure-databricks), Azure Synapse Analytics (use azure-synapse-analytics), Azure Stream Analytics (use azure-stream-analytics), Azure Data Explorer (use azure-data-ex

fetch
$curl "https://skillshub.wtf/MicrosoftDocs/Agent-Skills/azure-hdinsight?format=md"
SKILL.mdazure-hdinsight

Azure HDInsight Skill

This skill provides expert guidance for Azure HDInsight. Covers troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.

How to Use This Skill

IMPORTANT for Agent: Use the Category Index below to locate relevant sections. For categories with line ranges (e.g., L35-L120), use read_file with the specified lines. For categories with file links (e.g., [security.md](security.md)), use read_file on the linked reference file

IMPORTANT for Agent: If metadata.generated_at is more than 3 months old, suggest the user pull the latest version from the repository. If mcp_microsoftdocs tools are not available, suggest the user install it: Installation Guide

This skill requires network access to fetch documentation content:

  • Preferred: Use mcp_microsoftdocs:microsoft_docs_fetch with query string from=learn-agent-skill. Returns Markdown.
  • Fallback: Use fetch_webpage with query string from=learn-agent-skill&accept=text/markdown. Returns Markdown.

Category Index

CategoryLinesDescription
TroubleshootingL37-L132Diagnosing and fixing HDInsight cluster issues: creation/auth, networking, storage, Ambari/HDFS/Hive/HBase/Kafka/Spark/YARN problems, performance, disk/CPU, and known error codes/workarounds.
Best PracticesL133-L174Best practices for designing, securing, monitoring, scaling, and tuning HDInsight clusters and workloads (Hadoop, Spark, Hive, HBase, Kafka), including storage, migration, and performance optimization.
Decision MakingL175-L199Guidance on planning, sizing, upgrading, and migrating HDInsight clusters, including Hadoop, HBase, Kafka, storage, VM sizing, and handling version/feature retirements.
Architecture & Design PatternsL200-L214HDInsight cluster architecture, security/VNet design, HA/DR and business continuity patterns, migration from on-prem Hadoop, shared storage, streaming (Spark/YARN), and Oozie-based pipelines.
Limits & QuotasL215-L222Guidance on HDInsight capacity limits: log size/retention, supported cluster node sizes, external metastore constraints, and requesting/managing CPU core quota increases.
SecurityL223-L266Securing HDInsight clusters: identity and access (Entra, LDAP, Ranger, RBAC), network isolation (NSG, Private Link), TLS/encryption, Kafka/Hive/Spark security, and security best practices.
ConfigurationL267-L323Configuring and tuning HDInsight clusters: networking/VPN, Ambari/Hive/Spark/HBase settings, autoscale, monitoring/logging, SSH/Jupyter/VS Code access, and script-based customizations.
Integrations & Coding PatternsL324-L391Patterns and code samples for integrating HDInsight (Hive, Spark, Kafka, HBase, MapReduce, Sqoop) with tools, SDKs, REST/CLI, and external services like SQL, Cosmos DB, Power BI, IoT, and Synapse
DeploymentL392-L405Creating, configuring, migrating, and automating HDInsight clusters (Hadoop, HBase, Kafka) using portal, CLI, PowerShell, ARM/REST, Data Factory, Marketplace, AMA, and runbooks

Troubleshooting

TopicURL
Address reliability issues on older HDInsight imageshttps://learn.microsoft.com/en-us/azure/hdinsight/cluster-reliability-issues
Fix component version validation errors in HDInsight ARM templateshttps://learn.microsoft.com/en-us/azure/hdinsight/component-version-validation-error-arm-templates
Troubleshoot Azure HDInsight cluster creation errorshttps://learn.microsoft.com/en-us/azure/hdinsight/create-cluster-error-dictionary
Troubleshoot authentication issues for secure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/domain-joined-authentication-issues
Run diagnostic script when HDInsight cluster creation fails with DomainNotFoundhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/sample-script
Fix DomainNotFound errors during HDInsight cluster creationhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/troubleshoot-domainnotfound
Fix Apache Ambari directory alerts in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-directory-alerts
Troubleshoot Ambari UI down hosts and services in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-down-hosts-services
Fix Apache Ambari UI 502 errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-fivezerotwo-error
Resolve Apache Ambari heartbeat issues in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-heartbeat-issues
Troubleshoot Apache Ambari Metrics Collector on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-metricservice-issues
Resolve Apache Ambari stale alerts in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-stale-alerts
Fix local HDFS stuck in safe mode on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-hdfs-troubleshoot-safe-mode
Fix HDInsight cluster creation failureshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-cluster-creation-fails
Convert service principal certificates to base-64 for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-converting-service-principal-certificate
Resolve Data Lake storage file access issues in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-data-lake-files
Fix InvalidNetworkSecurityGroupSecurityRules for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-invalidnetworksecuritygroupsecurityrules-cluster-creation-fails
Resolve HDInsight node disk space exhaustionhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-out-disk-space
Fix Watchdog BUG soft lockup CPU errors in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-soft-lockup-cpu
Resolve node addition failures in HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-unable-add-nodes
Troubleshoot login failures to HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-unable-log-in-cluster
Manage and troubleshoot disk space issues in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-disk-space
Resolve InvalidNetworkConfigurationErrorCode in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-invalidnetworkconfigurationerrorcode-cluster-creation-fails
Restore Key Vault access for encrypted HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-lost-key-vault-access
Fix port conflicts when starting HDInsight serviceshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-port-conflict
Fix 'account does not support http' storage errors in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-wasbs-storage-exception
Fix invalid BCFile errors when reading YARN logshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-yarn-log-invalid-bcfile
Resolve BindException address-in-use on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-bindexception-address-use
Fix HBase hbck inconsistency errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-hbase-hbck-inconsistencies
Troubleshoot pegged CPU on HBase region servershttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-pegged-cpu-region-server
Resolve Apache Phoenix connectivity issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-phoenix-connectivity
Fix missing data in Phoenix views after HDP upgradehttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-phoenix-no-data
Fix HBase REST service not responding on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-rest-not-spending
Fix HBase Master startup failures on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-start-fails
Resolve storage exceptions after connection resethttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-storage-exception-reset
Resolve timeouts with hbase hbck on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-timeouts-hbase-hbck
Troubleshoot HBase region server issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-unassigned-regions
Fix HBase TTL data retention issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/troubleshoot-data-retention-issues-expired-data
Troubleshoot HBase REST API issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/troubleshoot-rest-api
Access and interpret YARN application logs on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-access-yarn-app-logs-linux
Enable and collect Hadoop heap dumps on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-collect-debug-heap-dump-linux
Resolve Hive out-of-memory errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-hive-out-of-memory-error-oom
Lookup and resolve Hadoop stack trace errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-stack-trace-error-messages
Understand and resolve WebHCat errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-templeton-webhcat-debug-errors
Known issues and troubleshooting for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues
Fix Ambari access failures after certificate rotationhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-ambari-access-certificate-issue
Resolve Ambari user switch issues on HDInsight 5.1https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-ambari-users-cache
Recover HDInsight headnodes from /tmp disk usage leakhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-cluster-head-node-unresponsive
Mitigate conda version regression on HDInsight 5.1https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-conda-version-regression
Resolve Ranger startup failures on ESP HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-ranger-cluster-create-failure
Diagnose slow or failing jobs on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-failed-cluster
HDInsight troubleshooting guide indexhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-guide
Troubleshoot HDFS issues in Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-hdfs
Common Hive issues and fixes on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-hive
Troubleshoot YARN issues in Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-yarn
Restore error messages in Ambari Hive View on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-error-message-hive-view
Resolve Hive log disk space issues on HDInsight head nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-hive-logs-diskspace-full-headnodes
Fix Hive View inaccessibility due to Zookeeper issueshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-inaccessible-hive-view
Troubleshoot Hive join OutOfMemory GC overhead errorshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-outofmemory-overhead-exceeded
Resolve permission denied errors creating Hive tableshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-permission-error-create-table
Diagnose poor Hive LLAP query performance in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-query-performance
Fix slow reducers and data skew in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-slow-reducer
Troubleshoot Apache Tez application hangs in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-tez-hangs
Fix slow or failing Ambari Tez View in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-tez-view-slow
Fix Hive View query result timeout in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-view-time-out
Correct Hive JDBC URL in Zeppelin interpreter on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-zookeeperhiveclientexception-hiveserver-configs
Resolve Ambari Hive View gateway timeout exceptionshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/troubleshoot-gateway-timeout
Troubleshoot Hive LLAP workload management issueshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/troubleshoot-workload-management-issues
Resolve Kafka broker startup failures from full diskshttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/kafka-troubleshoot-full-disk
Fix HDInsight Kafka error: insufficient fault domainshttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/kafka-troubleshoot-insufficient-domains
Debug Spark apps using HDInsight History Server extensionshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-azure-spark-history-server
Debug Spark job failures with IntelliJ Azure Toolkithttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-failure-debug
Remotely debug Apache Spark apps on HDInsight via IntelliJhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-plugin-debug-jobs-remotely
Debug HDInsight Spark jobs with YARN and Spark UIshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-job-debugging
Known issues and workarounds for HDInsight Spark clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-known-issues
Troubleshoot Spark Streaming apps stopping after 24 dayshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-application-stops
Fix Jupyter 404 'Blocking Cross Origin API' on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-blocking-cross-origin
Resolve RequestBodyTooLarge errors in HDInsight Spark streaminghttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-event-log-requestbodytoolarge
Fix IllegalArgumentException in HDInsight Spark activitieshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-illegalargumentexception
Resolve InvalidClassException version mismatch in HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-job-fails-invalidclassexception
Fix NoClassDefFoundError for Spark-Kafka on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-job-fails-noclassdeffounderror
Improve slow Spark jobs with many Azure Storage fileshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-job-slowness-container
Resolve OutOfMemoryError in HDInsight Spark clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-outofmemory
Resolve RpcTimeoutException and 502 errors in Spark Thrift on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-rpctimeoutexception
Troubleshoot large result downloads via JDBC/ODBC and Thrift on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-sparkexception-kryo-serialization-failed
Common Spark issues and fixes on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-troubleshoot-spark
Debug WASB file operations for HDInsight storage issueshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/troubleshoot-debug-wasb
Fix Jupyter Notebook creation issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/troubleshoot-jupyter-notebook-convert
Troubleshoot Apache Oozie workflows on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-oozie
Resolve Azure HDInsight resource creation capacity errorshttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-resource-creation-fails
Troubleshoot script action failures in Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-script-action
Work around Sqoop import/export failures on ESP HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-sqoop

Best Practices

TopicURL
Use Azure Monitor logs for HDInsight availabilityhttps://learn.microsoft.com/en-us/azure/hdinsight/cluster-availability-monitor-logs
Apply cluster management best practices in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/cluster-management-best-practices
Apply general best practices for HDInsight Enterprise Securityhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/general-guidelines
Plan and execute data migration to Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-data-migration
Apply infrastructure best practices for Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-infrastructure
Implement storage best practices for HDInsight migrationshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-storage
Optimize HDInsight HBase with Accelerated Writeshttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-accelerated-writes
Apply HBase performance advisor recommendations on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-advisor
Tune Apache Phoenix performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-phoenix-performance
Tune Apache HBase performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/troubleshoot-hbase-performance-issues
Scale HiveServer2 on HDInsight using edge nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-install-hiveserver2
Monitor HDInsight availability with Apache Ambarihttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-cluster-availability
Create HDInsight clusters with secure transfer-enabled storage accountshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-with-secure-transfer-storage
Apply Linux-specific tips for Hadoop on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-linux-information
Optimize Apache Hive query performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-optimize-hive-query
Monitor and optimize HDInsight cluster performancehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-key-scenarios-to-monitor
Schedule and apply OS patches for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-os-patching
Apply pre-creation best practices for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-overview-before-you-start
Manually scale HDInsight clusters for workload patternshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-scaling-best-practices
Apply gateway best practices for Hive on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/gateway-best-practices
Operate LLAP schedule-based autoscale on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/llap-schedule-based-autoscale-best-practices
Configure Kafka partition replicas for high availabilityhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-high-availability
Mirror Kafka topics between HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-mirroring
Tune Kafka on HDInsight for optimal performancehttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-performance-tuning
Configure managed disks to scale Kafka on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-scalability
Migrate HDInsight Log Analytics data to new tableshttps://learn.microsoft.com/en-us/azure/hdinsight/log-analytics-migration
Use Azure Storage effectively as HDInsight default filesystemhttps://learn.microsoft.com/en-us/azure/hdinsight/overview-azure-storage
Leverage Data Lake Storage Gen2 with HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/overview-data-lake-storage-gen2
Optimize Apache Spark job performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-perf
Manage Python packages for Jupyter on HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-python-package-installation
Configure Spark Streaming on HDInsight for exactly-once processinghttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-streaming-exactly-once
Optimize Apache Spark cluster configuration on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-cluster-configuration
Optimize data processing operations for Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-processing
Optimize data storage for Apache Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-storage
Tune Apache Spark memory usage on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-memory-usage
Safely manage JAR dependencies on HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/safely-manage-jar-dependency
Apply Apache Spark performance guidelines on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/spark-best-practices
Use SparkCruise to optimize Spark queries on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/spark-cruise

Decision Making

TopicURL
Plan ETL at scale with Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-etl-at-scale
Assess benefits of migrating on-premises Hadoop to Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-motivation
Choose HDInsight tools for custom MapReduce jobshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-run-custom-programs
Choose backup and replication options for HBasehttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-backup-replication
Migrate Apache HBase clusters to HDInsight 5.1https://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-hdinsight-5-1
Migrate HBase to HDInsight 5.1 with a new storage accounthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-hdinsight-5-1-new-storage-account
Migrate Apache HBase clusters to a newer HDInsight versionhttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-new-version
Migrate HBase to new HDInsight version and storage accounthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-new-version-new-storage-account
Plan HDInsight cluster capacity and performancehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-capacity-planning
Plan for HDInsight component and version retirementshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-component-retirements-and-action-required
Compare storage services for Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-compare-storage-options
Upgrade Azure HDInsight to Apache Ranger 2.3.0https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-ranger-5-1-migration
Assess and migrate from retired HDInsight versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-retired-versions
Select appropriate VM sizes for HDInsight nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-selecting-vm-size
Plan migration to newer Azure HDInsight cluster versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-upgrade-cluster
Size HDInsight Interactive Query (LLAP) clustershttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-llap-sizing-guide
Use Kafka MirrorMaker 2.0 for migration and replicationhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/kafka-mirrormaker-2-0-guide
Migrate Apache Kafka workloads from HDInsight 4.0 to 5.1https://learn.microsoft.com/en-us/azure/hdinsight/kafka/migrate-5-1-versions
Migrate Apache Kafka workloads from HDInsight 3.6 to 4.0https://learn.microsoft.com/en-us/azure/hdinsight/kafka/migrate-versions
Migrate HDInsight clusters from Basic to Standard Load Balancerhttps://learn.microsoft.com/en-us/azure/hdinsight/load-balancer-migration-guidelines
Migrate Ambari configurations from HDInsight 4.x to 5.xhttps://learn.microsoft.com/en-us/azure/hdinsight/migrate-ambari-recent-version-hdinsight

Architecture & Design Patterns

TopicURL
Use Apache Ambari for HDInsight cluster managementhttps://learn.microsoft.com/en-us/azure/hdinsight/apache-ambari-usage
Understand HDInsight architecture with Enterprise Security Packagehttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-architecture
Design architecture for migrating on-premises Hadoop to HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-architecture
Choose HDInsight business continuity architectureshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-business-continuity-architecture
Study HDInsight high availability and DR case designhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-high-availability-case-study
Understand HDInsight high availability architecture componentshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-high-availability-components
Share one Data Lake Storage account across multiple HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-multiple-clusters-data-lake-store
Operationalize HDInsight data pipelines with Ooziehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-operationalize-data-pipeline
Design scalable streaming architectures with HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-streaming-at-scale-overview
Azure HDInsight virtual network architecture and resourceshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-virtual-network-architecture
Design highly available Spark Streaming jobs on YARN in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-streaming-high-availability

Limits & Quotas

TopicURL
Plan HDInsight log sizes and retention policieshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-log-management
Use supported node configurations for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-supported-node-configuration
Use external metastores and understand HDInsight default metastore limitshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-use-external-metadata-stores
Request and manage HDInsight CPU core quota increaseshttps://learn.microsoft.com/en-us/azure/hdinsight/quota-increase-request

Security

TopicURL
Configure managed identity access to Blob storage for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/configure-azure-blob-storage
Configure double disk encryption for HDInsight data at resthttps://learn.microsoft.com/en-us/azure/hdinsight/disk-encryption
Configure HDInsight clusters with Entra Domain Services integrationhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-configure-using-azure-adds
Create and configure HDInsight Enterprise Security Package clustershttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-create-configure-enterprise-security-cluster
Manage users, roles, and security for HDInsight ESP clustershttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-manage
Configure Apache Ranger policies for HBase with ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-run-hbase
Configure Apache Ranger Hive policies in HDInsight ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-run-hive
Set Apache Ranger policies for Kafka with ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-run-kafka
Implement encryption in transit for Azure HDInsight nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/encryption-in-transit
Plan enterprise security options for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/hdinsight-security-overview
Secure Oozie workflows with HDInsight Enterprise Securityhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/hdinsight-use-oozie-domain-joined-clusters
Set up Azure HDInsight ID Broker for OAuth and MFAhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/identity-broker
Configure LDAP sync for Ranger and Ambari in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/ldap-sync
Manage SSH access for Entra domain accounts on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/ssh-domain-accounts
Configure Private Link for HDInsight Kafka REST Proxyhttps://learn.microsoft.com/en-us/azure/hdinsight/enable-private-link-on-kafka-rest-proxy-hdi-cluster
Implement Enterprise Security Package for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/enterprise-security-package
Apply security and DevOps best practices for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-security-devops
Manage Ambari Views permissions on ESP-enabled HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-authorize-users-to-ambari
Implement non-interactive .NET auth for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-create-non-interactive-authentication-dotnet-applications
Use managed identities with Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-managed-identities
Allow HDInsight management IPs in NSGs and routeshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-management-ip-addresses
Migrate to granular role-based access for HDInsight cluster configurationshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-migrate-granular-access-cluster-configurations
Enable Azure Private Link for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-private-link
Restrict public connectivity for Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-restrict-public-connectivity
Safely rotate HDInsight storage account access keyshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-rotate-storage-keys
Use HDInsight NSG service tags for management traffichttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-service-tags
Restrict HDInsight Blob data access using SAS tokenshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-storage-sharedaccesssignature-permissions
Synchronize Microsoft Entra users to HDInsight ESP clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-sync-aad-users-to-cluster
Create and manage Entra ID-authenticated HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/create-clusters-with-entra
Configure ARM templates for Entra ID-enabled HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/manage-entra-id-enabled-azure-hdinsight-clusters-with-arm-templates
Manage Entra ID-enabled HDInsight clusters via REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/manage-entra-id-enabled-cluster-with-rest-api
Configure security options for Hive in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hdinsight-security-options-for-hive
Set up TLS and client auth for ESP Kafka clustershttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-esp-kafka-ssl-encryption-authentication
Configure TLS encryption and client auth for HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-ssl-encryption-authentication
Secure Spark–Kafka streaming integration on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/secure-spark-kafka-streaming-integration-scenario
Fetch OAuth tokens from HDInsight to access Azure serviceshttps://learn.microsoft.com/en-us/azure/hdinsight/msi-support-to-access-azure-services
Apply built-in Azure Policy definitions for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/policy-reference
Configure Ranger policies for Spark SQL in HDInsight ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/ranger-policies-for-spark
Configure TLS versions for Azure HDInsight gatewayshttps://learn.microsoft.com/en-us/azure/hdinsight/transport-layer-security
Configure HDInsight managed identity for SQL authenticationhttps://learn.microsoft.com/en-us/azure/hdinsight/use-managed-identity-for-sql-database-authentication-in-azure-hdinsight

Configuration

TopicURL
Configure Ambari Web UI auto-logout timeout in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/ambari-web-ui-auto-logout
Connect HDInsight clusters to on-premises networks with VPN and DNShttps://learn.microsoft.com/en-us/azure/hdinsight/connect-on-premises-network
Configure HBase cluster replication in Azure VNetshttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-replication
Use HBCK2 to repair HBase on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/how-to-use-hbck2-tool
Check HDInsight 4.0 open-source component versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-40-component-versioning
Check HDInsight 5.x open-source component versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-5x-component-versioning
Manage HDInsight clusters using Azure CLI commandshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-command-line
Automate HDInsight cluster management with PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-powershell
Configure and use empty edge nodes in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-use-edge-node
Configure HDInsight Autoscale policies and limitshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-autoscale-clusters
Tune HDInsight cluster settings using Ambarihttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-changing-configs-via-ambari
Review bundled open-source components and versions in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning
Configure Azure HDInsight VS Code extension settingshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-config-for-vscode
Create and configure VNets, NSGs, and DNS for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-create-virtual-network
Configure custom Ambari database for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-custom-ambari-db
Preload Apache Hive libraries during HDInsight cluster creationhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-add-hive-libraries
Add extra Azure Storage accounts to existing HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-add-storage
Programmatically customize HDInsight cluster configuration with bootstraphttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-customize-cluster-bootstrap
Customize HDInsight clusters using script actionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-customize-cluster-linux
Connect to Azure HDInsight clusters using SSHhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-linux-use-ssh-unix
Enable Azure Monitor logs for HDInsight cluster operationshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-oms-log-analytics-tutorial
Reference ports for Hadoop services on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services
Configure and customize HDInsight clusters across toolshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters
Develop script actions to configure Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-script-actions-linux
Configure SSH tunneling to access HDInsight web UIshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-linux-ambari-ssh-tunnel
Secure HDInsight outbound traffic using Azure Firewallhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-restrict-outbound-traffic
Custom-tune HDInsight Autoscale advanced settingshttps://learn.microsoft.com/en-us/azure/hdinsight/how-to-custom-configure-hdinsight-autoscale
Configure Apache Hive replication on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-replication
Migrate Hive default metastore to external SQL Database on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-default-metastore-export-import
Configure Hive LLAP workload management pools in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-workload-management
Use Hive LLAP workload management commands in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/workload-management-commands
Enable automatic topic creation in HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-auto-create-topics
Configure VPN and VNet access to HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-connect-vpn-gateway
Configure Azure Monitor logs for HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-log-analytics-operations-management
Configure cross-VNet connectivity to HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/connect-kafka-cluster-with-vm-in-different-vnet
Configure cross-VNet client connectivity to HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/connect-kafka-with-vnet
Configure monitoring and alerts for Azure HDInsight with Azure Monitorhttps://learn.microsoft.com/en-us/azure/hdinsight/monitor-hdinsight
Reference of monitoring data for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/monitor-hdinsight-reference
Configure non-Azure Firewall network virtual appliances for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/network-virtual-appliance
Optimize HBase performance with Ambari configurationhttps://learn.microsoft.com/en-us/azure/hdinsight/optimize-hbase-ambari
Optimize Hive performance via Ambari settings in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/optimize-hive-ambari
Tune Pig properties with Ambari on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/optimize-pig-ambari
Configure selective logging for AMA on HDInsight via script actionshttps://learn.microsoft.com/en-us/azure/hdinsight/selective-logging-analysis
Configure selective logging for HDInsight clusters with script actionshttps://learn.microsoft.com/en-us/azure/hdinsight/selective-logging-analysis-azure-logs
Configure service endpoint policies for HDInsight virtual networkshttps://learn.microsoft.com/en-us/azure/hdinsight/service-endpoint-policies
Set up PySpark interactive environment with VS Code HDInsight Toolshttps://learn.microsoft.com/en-us/azure/hdinsight/set-up-pyspark-interactive-environment
Configure HDInsight IO Cache to speed up Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-improve-performance-iocache
Use HDInsight Spark Jupyter kernels effectivelyhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-notebook-kernels
Configure Jupyter on HDInsight to use Maven packageshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages
Configure and scope Spark dependencies on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-manage-dependencies
Tune Spark resource configuration on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager
Configure Apache Spark settings on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-settings
Transfer files to Azure HDInsight using SCPhttps://learn.microsoft.com/en-us/azure/hdinsight/use-scp

Integrations & Coding Patterns

TopicURL
Configure Ambari email alerts with SendGrid in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/apache-ambari-email
Stream from Kafka to Azure Cosmos DB with Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/apache-kafka-spark-structured-streaming-cosmosdb
Execute common HDInsight tasks with Azure CLI sampleshttps://learn.microsoft.com/en-us/azure/hdinsight/azure-cli-samples
Connect Excel to HDInsight Hadoop via Power Queryhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-excel-power-query
Query HDInsight Hive from Java using JDBChttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-hive-jdbc-driver
Visualize HDInsight Hive data in Power BI via ODBChttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-hive-power-bi
Integrate C# UDFs with Hive and Pig on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-hive-pig-udf-dotnet-csharp
Call WebHCat REST API for Hive with Curlhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-curl
Submit Hive jobs using HDInsight .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-dotnet-sdk
Run HDInsight Hive queries with Azure PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-powershell
Use Visual Studio Data Lake tools for Hive on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-visual-studio
Submit MapReduce jobs to HDInsight using Curl and WebHCathttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-curl
Submit MapReduce jobs to HDInsight with .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-dotnet-sdk
Run HDInsight MapReduce jobs using Azure PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-powershell
Run MapReduce jobs on HDInsight via SSHhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-ssh
Submit Sqoop jobs to HDInsight via Curl and WebHCathttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-curl
Run Sqoop jobs on HDInsight using .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-dotnet-sdk
Use Sqoop on HDInsight Linux headnodes for SQL integrationhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-mac-linux
Submit Sqoop jobs to HDInsight with PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-powershell
Use Visual Studio Data Lake Tools with HDInsight Hadoophttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-visual-studio-tools-get-started
Configure Beeline connections to HDInsight HiveServer2https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/connect-install-beeline
Run Sqoop jobs between HDInsight and Azure SQL Databasehttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-use-sqoop
Use Python UDFs with Hive and Pig on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/python-udf-hdinsight
Submit Hadoop jobs to HDInsight via .NET, curl, and PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/submit-apache-hadoop-jobs-programmatically
Build and deploy a Java HBase client with Mavenhttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-build-java-maven-linux
Run HBase SQL queries with Phoenix and Zeppelinhttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-phoenix-zeppelin
Use the HBase .NET SDK with HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-rest-sdk
Use Phoenix Query Server REST SDK on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-using-phoenix-query-server-rest-sdk
Use HDInsight .NET SDK for cluster management taskshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-dotnet-sdk
Use Spark DStreams with Kafka on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-with-kafka
Install custom Hadoop applications on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-install-custom-applications
Use Spark & Hive Tools for VS Code with HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-for-vscode
Use the Azure HDInsight SDK for Go with Hadoop clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-go-sdk-overview
Install and access Hue on Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-hue-linux
Manage HDInsight Hadoop clusters using Ambari REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-manage-ambari-rest-api
Run .NET MapReduce jobs on Linux-based HDInsight using Monohttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-migrate-dotnet-to-linux
Define and run Oozie workflows on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-use-oozie-linux-mac
Use Spark HBase Connector between HDInsight Spark and HBasehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-using-spark-query-hbase
Manage Entra-enabled HDInsight clusters using .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/manage-hadoop-cluster-dot-net-sdk
Run Hive queries on Entra-enabled HDInsight using PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-apache-hive-queries-using-powershell-on-entra-enabled-hdinsight-cluster
Run Hive queries on HDInsight using the REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-apache-hive-queries-using-rest-api
Run Hive queries on Entra-enabled HDInsight with .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-hive-queries-using-dot-net-sdk
Submit MapReduce jobs to Entra-enabled HDInsight using .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-map-reduce-jobs-dot-net-sdk
Run MapReduce jobs on Entra-enabled HDInsight with PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-map-reduce-jobs-entra-id-enabled-using-powershell
Run MapReduce jobs on Entra-enabled HDInsight via REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-map-reduce-rest-jobs
Submit Spark jobs to Entra-enabled HDInsight via Livy REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-spark-jobs-using-rest-api
Use Power BI DirectQuery with HDInsight Hivehttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hadoop-connect-hive-power-bi-directquery
Integrate Spark and Hive using Hive Warehouse Connectorhttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-warehouse-connector
Run Spark operations via Hive Warehouse Connectorhttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-warehouse-connector-operations
Use Hive Warehouse Connector from Zeppelin via Livyhttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin
Use Hive Warehouse Connector APIs on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-warehouse-connector-apis
Use Hive Warehouse Connector 2.x APIs on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-warehouse-connector-v2-apis
Integrate HDInsight Kafka with Azure IoT Hubhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-connector-iot-hub
Use Kafka REST Proxy with HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/rest-proxy
Use Kafka REST Proxy on HDInsight via Azure CLIhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/tutorial-cli-rest-proxy
Connect Synapse Spark pools to HDInsight external Hive Metastorehttps://learn.microsoft.com/en-us/azure/hdinsight/share-hive-metastore-with-synapse
Analyze Application Insights telemetry with Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-analyze-application-insight-logs
Connect HDInsight Spark to Azure SQL Databasehttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-connect-to-sql-database
Create and submit Scala Spark apps from Eclipse to HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-eclipse-tool-plugin
Develop and submit Spark apps with IntelliJ Azure Toolkithttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-plugin
Submit remote Spark jobs to HDInsight using Livy REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface
Integrate Microsoft Cognitive Toolkit with Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-microsoft-cognitive-toolkit
Run Azure Machine Learning AutoML on HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-run-machine-learning-automl
Run Apache Pig workloads on HDInsight Hadoophttps://learn.microsoft.com/en-us/azure/hdinsight/use-pig

Deployment

TopicURL
Migrate HDInsight monitoring to Azure Monitor Agent (AMA)https://learn.microsoft.com/en-us/azure/hdinsight/azure-monitor-agent
Deploy HBase clusters in Azure Virtual Networkshttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-provision-vnet
Publish Azure HDInsight applications to Azure Marketplacehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-publish-applications
Operationalize on-demand HDInsight Hadoop clusters with Data Factoryhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-adf
Deploy HDInsight clusters using ARM templateshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-arm-templates
Provision HDInsight 4.0 clusters using Azure CLIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-azure-cli
Create Linux HDInsight clusters using PowerShell scriptshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-azure-powershell
Create HDInsight clusters via Azure REST and ARM templateshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-curl-rest
Create Linux-based HDInsight clusters via Azure portalhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-portal
Migrate HDInsight Kafka clusters using MirrorMaker 2https://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-mirror-maker-2
Provision and delete HDInsight clusters via Automation runbookshttps://learn.microsoft.com/en-us/azure/hdinsight/manage-clusters-runbooks

> related_skills --same-repo

> azure-well-architected

Expert guidance for designing, assessing, and optimizing Azure workloads using Azure Well Architected. Covers design review checklists, recommendations, design principles, tradeoffs, service guides, workload patterns, and assessment questions. Use when architecting new solutions, reviewing existing workloads, or applying Well-Architected principles.

> azure-web-pubsub

Expert knowledge for Azure Web PubSub development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building, debugging, or optimizing Azure Web PubSub applications. Not for Azure SignalR Service (use azure-signalr-service), Azure Event Hubs (use azure-event-hubs), Azure Service Bus (use azure-service-bus), Azure Relay (use azure-relay).

> azure-web-application-firewall

Expert knowledge for Azure Web Application Firewall development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building, debugging, or optimizing Azure Web Application Firewall applications. Not for Azure Application Gateway (use azure-application-gateway), Azure Front Door (use azure-front-door), Azure Firewall (use azure-firewall), Azure DDos Protectio

> azure-vpn-gateway

Expert knowledge for Azure VPN Gateway development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building, debugging, or optimizing Azure VPN Gateway applications. Not for Azure Virtual Network (use azure-virtual-network), Azure Virtual WAN (use azure-virtual-wan), Azure ExpressRoute (use azure-expressroute), Azure Application Gateway (use azure-applica

┌ stats

installs/wk0
░░░░░░░░░░
github stars425
██████████
first seenMar 17, 2026
└────────────

┌ repo

MicrosoftDocs/Agent-Skills
by MicrosoftDocs
└────────────

┌ tags

└────────────