> airbyte
You are an expert in Airbyte, the open-source data integration platform with 300+ pre-built connectors. You help developers sync data from SaaS tools, databases, and APIs into data warehouses and lakes — handling incremental syncs, CDC (Change Data Capture), schema evolution, and error recovery for production data pipelines.
curl "https://skillshub.wtf/TerminalSkills/skills/airbyte?format=md"Airbyte — Open-Source Data Integration Platform
You are an expert in Airbyte, the open-source data integration platform with 300+ pre-built connectors. You help developers sync data from SaaS tools, databases, and APIs into data warehouses and lakes — handling incremental syncs, CDC (Change Data Capture), schema evolution, and error recovery for production data pipelines.
Core Capabilities
Self-Hosted Setup
# Docker Compose (recommended for small-medium)
git clone https://github.com/airbytehq/airbyte.git
cd airbyte && ./run-ab-platform.sh
# UI at http://localhost:8000
# Kubernetes (production)
helm repo add airbyte https://airbytehq.github.io/helm-charts
helm install airbyte airbyte/airbyte -n airbyte --create-namespace
# Cloud: https://cloud.airbyte.com (managed)
Configuration via API
# Create connections programmatically via Airbyte API
import requests
AIRBYTE_API = "http://localhost:8000/api/v1"
# Create a Stripe source
source = requests.post(f"{AIRBYTE_API}/sources/create", json={
"workspaceId": workspace_id,
"name": "Stripe Production",
"sourceDefinitionId": "e094cb9a-26de-4645-8761-65c0c425d1de", # Stripe
"connectionConfiguration": {
"account_id": "acct_xxx",
"client_secret": os.environ["STRIPE_SECRET_KEY"],
"start_date": "2025-01-01T00:00:00Z",
},
}).json()
# Create a BigQuery destination
destination = requests.post(f"{AIRBYTE_API}/destinations/create", json={
"workspaceId": workspace_id,
"name": "BigQuery Warehouse",
"destinationDefinitionId": "22f6c74f-5699-40ff-833c-4a879ea40133",
"connectionConfiguration": {
"project_id": "my-project",
"dataset_id": "raw_stripe",
"credentials_json": os.environ["GCP_CREDENTIALS"],
"loading_method": {"method": "GCS Staging", "gcs_bucket_name": "airbyte-staging"},
},
}).json()
# Create connection (source → destination)
connection = requests.post(f"{AIRBYTE_API}/connections/create", json={
"sourceId": source["sourceId"],
"destinationId": destination["destinationId"],
"syncCatalog": {
"streams": [
{
"stream": {"name": "subscriptions", "namespace": "stripe"},
"config": {
"syncMode": "incremental",
"destinationSyncMode": "append_dedup",
"cursorField": ["created"],
"primaryKey": [["id"]],
},
},
],
},
"schedule": {"scheduleType": "cron", "cronExpression": "0 */2 * * * ?"},
"namespaceFormat": "raw_${SOURCE_NAMESPACE}",
}).json()
Custom Connectors (CDK)
# Build a custom source connector with Airbyte CDK
from airbyte_cdk.sources import AbstractSource
from airbyte_cdk.sources.streams import Stream
from airbyte_cdk.sources.streams.http import HttpStream
class InternalAPIStream(HttpStream):
url_base = "https://api.internal.company.com/v1/"
primary_key = "id"
cursor_field = "updated_at"
def path(self, **kwargs) -> str:
return "events"
def parse_response(self, response, **kwargs):
for record in response.json()["data"]:
yield record
class Source(AbstractSource):
def check_connection(self, logger, config):
# Verify API credentials work
return True, None
def streams(self, config):
return [InternalAPIStream(authenticator=self.get_auth(config))]
Installation
# Docker Compose
curl -o docker-compose.yaml https://raw.githubusercontent.com/airbytehq/airbyte/master/docker-compose.yaml
docker compose up -d
# Python CDK for custom connectors
pip install airbyte-cdk
Best Practices
- Incremental syncs — Use incremental mode for large tables; full refresh only for small reference tables
- CDC for databases — Use Change Data Capture (logical replication) for real-time PostgreSQL/MySQL syncs
- Staging area — Configure GCS/S3 staging for BigQuery/Snowflake destinations; direct insert is slow for large volumes
- Schema evolution — Airbyte handles new columns automatically; configure
auto_propagationin connection settings - Alerting — Set up webhook notifications for sync failures; integrate with Slack/PagerDuty
- Namespace per source — Use
raw_${SOURCE}namespace pattern; keeps raw data organized before dbt transforms - Self-host for cost — Airbyte Cloud charges per row synced; self-hosting is free for unlimited data
- Custom connectors — Use CDK for internal APIs; publish to Airbyte's connector marketplace for community use
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.