found 38 skills in registry
Parse and generate CSV files with the csv package — stream large files, handle custom delimiters, transform records, validate data, and generate CSV output from objects. Use when tasks involve data import/export, ETL pipelines, processing uploaded CSV files, or generating downloadable reports.
Expert guidance for Cube, the headless BI and semantic layer that sits between your data warehouse and analytics applications. Helps developers define data models, create metrics APIs, and build analytics features in applications with consistent, governed access to business metrics.
Expert guidance for Pandera, the Python library for validating pandas and Polars DataFrames with expressive schemas. Helps developers define data contracts, validate data pipelines, and catch data quality issues before they corrupt downstream systems.
You are an expert in Airbyte, the open-source data integration platform with 300+ pre-built connectors. You help developers sync data from SaaS tools, databases, and APIs into data warehouses and lakes — handling incremental syncs, CDC (Change Data Capture), schema evolution, and error recovery for production data pipelines.
Validate data quality in CSV, JSON, and database exports by checking for missing values, type mismatches, duplicates, outliers, and schema violations. Use when building ETL pipelines, auditing data imports, checking data freshness, or ensuring data contracts between teams. Trigger words: data quality, validation, null values, duplicates, schema check, data contract, ETL, pipeline, data drift.
Build workflow automations with n8n. Use when a user asks to automate business workflows, connect APIs visually, build integrations between apps, self-host a Zapier alternative, or create data pipelines with a visual editor.
You are an expert in dlt, the open-source Python library for building data pipelines. You help developers load data from any API, file, or database into warehouses and lakes using simple Python decorators — with automatic schema inference, incremental loading, and built-in data contracts. dlt is the "requests library for data pipelines."
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Use when migrating a dbt project from one data platform or data warehouse to another (e.g., Snowflake to Databricks, Databricks to Snowflake) using dbt Fusion's real-time compilation to identify and fix SQL dialect differences.
Build real-time streaming applications with Azure Event Hubs SDK for Java. Use when implementing event streaming, high-throughput data ingestion, or building event-driven architectures.
Orchestrate data pipelines with Airflow. TaskFlow API, sensors, XCom, and scheduling.
Azure Event Hubs SDK for Rust. Use for sending and receiving events, streaming data ingestion. Triggers: "event hubs rust", "ProducerClient rust", "ConsumerClient rust", "send event rust", "streaming rust".
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
This skill empowers Claude to preprocess and clean data using automated pipelines. It is designed to streamline data preparation for machine learning tasks, implementing best practices for data validation, transformation, and error handling. Claude should use this skill when the user requests data preprocessing, data cleaning, ETL tasks, or mentions the need for automated pipelines for data preparation. Trigger terms include "preprocess data", "clean data", "ETL pipeline", "data transformation"
Azure Event Hubs SDK for .NET. Use for high-throughput event streaming: sending events (EventHubProducerClient, EventHubBufferedProducerClient), receiving events (EventProcessorClient with checkpointing), partition management, and real-time data ingestion. Triggers: "Event Hubs", "event streaming", "EventHubProducerClient", "EventProcessorClient", "send events", "receive events", "checkpointing", "partition".
Writes and executes SQL queries against the data warehouse using dbt's Semantic Layer or ad-hoc SQL to answer business questions. Use when a user asks about analytics, metrics, KPIs, or data (e.g., "What were total sales last quarter?", "Show me top customers by revenue"). NOT for validating, testing, or building dbt models during development.
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.
Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms. Use PROACTIVELY for data pipeline design, analytics infrastructure, or modern data stack implementation.