found 38 skills in registry
Parse and generate CSV files with the csv package — stream large files, handle custom delimiters, transform records, validate data, and generate CSV output from objects. Use when tasks involve data import/export, ETL pipelines, processing uploaded CSV files, or generating downloadable reports.
Process large-scale data with Apache Spark. Use when a user asks to process big data, run distributed computations, build ETL pipelines, perform data analysis at scale, or use PySpark for data engineering.
Expert guidance for Cube, the headless BI and semantic layer that sits between your data warehouse and analytics applications. Helps developers define data models, create metrics APIs, and build analytics features in applications with consistent, governed access to business metrics.
Expert guidance for Pandera, the Python library for validating pandas and Polars DataFrames with expressive schemas. Helps developers define data contracts, validate data pipelines, and catch data quality issues before they corrupt downstream systems.
Validate data quality in CSV, JSON, and database exports by checking for missing values, type mismatches, duplicates, outliers, and schema violations. Use when building ETL pipelines, auditing data imports, checking data freshness, or ensuring data contracts between teams. Trigger words: data quality, validation, null values, duplicates, schema check, data contract, ETL, pipeline, data drift.
Build workflow automations with n8n. Use when a user asks to automate business workflows, connect APIs visually, build integrations between apps, self-host a Zapier alternative, or create data pipelines with a visual editor.
When the user needs to migrate data between databases, transform schemas, or consolidate data sources. Use when the user mentions "data migration," "database migration," "migrate from MySQL to PostgreSQL," "schema migration," "ETL pipeline," "data transfer," "database consolidation," "legacy migration," or "move data between databases." Covers schema analysis, mapping, transformation, batch processing, validation, and cutover planning. For query optimization during migration, see sql-optimizer.
Use when migrating a dbt project from one data platform or data warehouse to another (e.g., Snowflake to Databricks, Databricks to Snowflake) using dbt Fusion's real-time compilation to identify and fix SQL dialect differences.
Azure Event Hubs SDK for Rust. Use for sending and receiving events, streaming data ingestion. Triggers: "event hubs rust", "ProducerClient rust", "ConsumerClient rust", "send event rust", "streaming rust".
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.
Writes and executes SQL queries against the data warehouse using dbt's Semantic Layer or ad-hoc SQL to answer business questions. Use when a user asks about analytics, metrics, KPIs, or data (e.g., "What were total sales last quarter?", "Show me top customers by revenue"). NOT for validating, testing, or building dbt models during development.
Orchestrate data pipelines with Airflow. TaskFlow API, sensors, XCom, and scheduling.
Build real-time streaming applications with Azure Event Hubs SDK for Java. Use when implementing event streaming, high-throughput data ingestion, or building event-driven architectures.
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
This skill empowers Claude to preprocess and clean data using automated pipelines. It is designed to streamline data preparation for machine learning tasks, implementing best practices for data validation, transformation, and error handling. Claude should use this skill when the user requests data preprocessing, data cleaning, ETL tasks, or mentions the need for automated pipelines for data preparation. Trigger terms include "preprocess data", "clean data", "ETL pipeline", "data transformation"
Azure Event Hubs SDK for .NET. Use for high-throughput event streaming: sending events (EventHubProducerClient, EventHubBufferedProducerClient), receiving events (EventProcessorClient with checkpointing), partition management, and real-time data ingestion. Triggers: "Event Hubs", "event streaming", "EventHubProducerClient", "EventProcessorClient", "send events", "receive events", "checkpointing", "partition".
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms. Use PROACTIVELY for data pipeline design, analytics infrastructure, or modern data stack implementation.