> dlt
You are an expert in dlt, the open-source Python library for building data pipelines. You help developers load data from any API, file, or database into warehouses and lakes using simple Python decorators — with automatic schema inference, incremental loading, and built-in data contracts. dlt is the "requests library for data pipelines."
curl "https://skillshub.wtf/TerminalSkills/skills/dlt?format=md"dlt (Data Load Tool) — Python-First Data Ingestion
You are an expert in dlt, the open-source Python library for building data pipelines. You help developers load data from any API, file, or database into warehouses and lakes using simple Python decorators — with automatic schema inference, incremental loading, and built-in data contracts. dlt is the "requests library for data pipelines."
Core Capabilities
Basic Pipeline
import dlt
# Simplest pipeline: Python generator → warehouse
@dlt.resource(write_disposition="append")
def github_events():
"""Load GitHub events for a repository."""
import requests
response = requests.get("https://api.github.com/repos/org/repo/events")
yield from response.json()
# Run pipeline
pipeline = dlt.pipeline(
pipeline_name="github_events",
destination="bigquery", # or: postgres, snowflake, duckdb, motherduck
dataset_name="raw_github",
)
load_info = pipeline.run(github_events())
print(load_info) # Schema inferred automatically
Incremental Loading
@dlt.resource(
write_disposition="merge", # Upsert: update existing, insert new
primary_key="id",
)
def orders(
updated_at=dlt.sources.incremental(
"updated_at",
initial_value="2025-01-01T00:00:00Z"
)
):
"""Load orders incrementally — only new/changed since last run.
dlt tracks the cursor automatically between runs.
No need to store state manually.
"""
import requests
page = 1
while True:
response = requests.get("https://api.shop.com/orders", params={
"updated_after": updated_at.last_value,
"page": page,
"per_page": 100,
})
data = response.json()
if not data:
break
yield from data
page += 1
REST API Source (Declarative)
from dlt.sources.rest_api import rest_api_source
# Declarative API source — no code needed for standard REST APIs
source = rest_api_source({
"client": {
"base_url": "https://api.hubspot.com/crm/v3/",
"auth": { "type": "bearer", "token": dlt.secrets["hubspot_token"] },
"paginator": { "type": "offset", "limit": 100, "offset_param": "offset" },
},
"resources": [
{
"name": "contacts",
"endpoint": { "path": "objects/contacts" },
"write_disposition": "merge",
"primary_key": "id",
},
{
"name": "deals",
"endpoint": { "path": "objects/deals" },
"write_disposition": "merge",
"primary_key": "id",
},
],
})
pipeline = dlt.pipeline(destination="bigquery", dataset_name="raw_hubspot")
pipeline.run(source)
Data Contracts
# Enforce schema contracts — fail loudly on unexpected changes
@dlt.resource(
write_disposition="merge",
primary_key="id",
columns={
"id": {"data_type": "bigint", "nullable": False},
"email": {"data_type": "text", "nullable": False},
"plan": {"data_type": "text", "nullable": False},
"mrr_cents": {"data_type": "bigint"},
},
schema_contract="evolve", # "freeze" | "evolve" | "discard_value" | "discard_row"
)
def customers():
# If API returns unexpected fields, dlt handles per contract setting
yield from fetch_customers()
Installation
pip install dlt[bigquery] # + destination adapter
# Other destinations: dlt[snowflake], dlt[postgres], dlt[duckdb], dlt[motherduck]
Best Practices
- Start with DuckDB — Develop locally with
destination="duckdb", switch to BigQuery/Snowflake for production - Incremental for APIs — Use
dlt.sources.incrementalfor stateful loading; dlt tracks cursor between runs - REST API source — Use the declarative
rest_api_sourcefor standard REST APIs; write custom resources only for complex APIs - Merge for entities — Use
write_disposition="merge"withprimary_keyfor entity tables;appendfor event streams - Schema contracts — Set
schema_contract="freeze"in production to catch breaking API changes immediately - Secrets management — Use
dlt.secrets["key"]backed by environment variables or.dlt/secrets.toml - Transformations — Use
add_map()for row-level transforms during loading; heavier transforms belong in dbt - Deploy anywhere — dlt is a library, not a service; deploy in cron, Airflow, Dagster, GitHub Actions, or Lambda
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.