> ibis

Expert guidance for Ibis, the Python dataframe library that provides a pandas-like API but generates SQL for execution on any backend — DuckDB, PostgreSQL, BigQuery, Snowflake, Spark, and more. Helps developers write analytics code once and run it anywhere without rewriting SQL for each database.

fetch
$curl "https://skillshub.wtf/TerminalSkills/skills/ibis?format=md"
SKILL.mdibis

Ibis — Portable Python Analytics

Overview

Ibis, the Python dataframe library that provides a pandas-like API but generates SQL for execution on any backend — DuckDB, PostgreSQL, BigQuery, Snowflake, Spark, and more. Helps developers write analytics code once and run it anywhere without rewriting SQL for each database.

Instructions

Basic Usage

# src/analytics.py — Portable analytics with Ibis
import ibis
from ibis import _                         # Shorthand for column references

# Connect to a backend (DuckDB for local development)
con = ibis.duckdb.connect("analytics.duckdb")

# Or connect to production databases — same code, different backend
# con = ibis.postgres.connect(url="postgresql://...")
# con = ibis.bigquery.connect(project_id="my-project")
# con = ibis.snowflake.connect(...)

# Load data
orders = con.table("orders")

# Build a query — this is lazy (no execution until you call .execute())
monthly_revenue = (
    orders
    .filter(_.status == "completed")
    .filter(_.created_at >= "2026-01-01")
    .group_by(month=_.created_at.truncate("M"))
    .agg(
        revenue=_.amount.sum(),
        order_count=_.count(),
        unique_customers=_.customer_id.nunique(),
        avg_order_value=_.amount.mean(),
    )
    .order_by(_.month)
)

# Execute and get a pandas DataFrame
df = monthly_revenue.execute()
print(df)

# Or see the generated SQL
print(ibis.to_sql(monthly_revenue))

Complex Transformations

# Window functions, joins, and case expressions
import ibis
from ibis import _

con = ibis.duckdb.connect("analytics.duckdb")
orders = con.table("orders")
customers = con.table("customers")

# Window functions — running totals and rankings
ranked = (
    orders
    .filter(_.status == "completed")
    .group_by(_.customer_id)
    .agg(
        total_spent=_.amount.sum(),
        order_count=_.count(),
        first_order=_.created_at.min(),
        last_order=_.created_at.max(),
    )
    .mutate(
        # Rank customers by revenue
        revenue_rank=ibis.rank().over(
            order_by=ibis.desc(_.total_spent)
        ),
        # Percentile
        revenue_percentile=ibis.percent_rank().over(
            order_by=_.total_spent
        ),
        # Customer segment based on spending
        segment=ibis.case()
            .when(_.total_spent >= 1000, "whale")
            .when(_.total_spent >= 100, "regular")
            .else_("casual")
            .end(),
    )
)

# Joins
customer_analytics = (
    ranked
    .join(customers, _.customer_id == customers.id)
    .select(
        _.customer_id,
        customers.name,
        customers.email,
        customers.plan,
        _.total_spent,
        _.order_count,
        _.segment,
        _.revenue_rank,
        # Days since last order
        days_inactive=(ibis.now() - _.last_order).cast("int32") // 86400,
    )
)

# Cohort analysis
cohorts = (
    orders
    .filter(_.status == "completed")
    .group_by(_.customer_id)
    .mutate(
        cohort_month=_.created_at.min().truncate("M"),
    )
    .mutate(
        months_since=((_.created_at.truncate("M") - _.cohort_month)
                      .cast("int32") // (30 * 86400)),
    )
    .group_by(_.cohort_month, _.months_since)
    .agg(
        active_users=_.customer_id.nunique(),
        revenue=_.amount.sum(),
    )
)

Backend Portability

# The same analytics code runs on any backend
import ibis

def build_revenue_report(con: ibis.BaseBackend):
    """Build a revenue report — works on any Ibis backend.

    Args:
        con: Any Ibis connection (DuckDB, Postgres, BigQuery, etc.)
    """
    orders = con.table("orders")

    return (
        orders
        .filter(_.status == "completed")
        .group_by(
            month=_.created_at.truncate("M"),
            category=_.category,
        )
        .agg(
            revenue=_.amount.sum(),
            orders=_.count(),
        )
        .order_by(_.month.desc())
    )

# Development: DuckDB on local Parquet files
dev_con = ibis.duckdb.connect()
dev_con.read_parquet("data/orders.parquet", table_name="orders")
report = build_revenue_report(dev_con).execute()

# Production: BigQuery
prod_con = ibis.bigquery.connect(project_id="prod-project", dataset_id="analytics")
report = build_revenue_report(prod_con).execute()

# Testing: in-memory with DuckDB
test_con = ibis.duckdb.connect()
test_con.create_table("orders", test_data_df)
report = build_revenue_report(test_con).execute()

UDFs and Custom Functions

# Custom scalar and aggregate functions
import ibis
from ibis import udf

@udf.scalar.python
def normalize_email(email: str) -> str:
    """Normalize email addresses for deduplication."""
    local, domain = email.lower().split("@")
    # Remove dots and plus aliases from Gmail
    if domain in ("gmail.com", "googlemail.com"):
        local = local.split("+")[0].replace(".", "")
    return f"{local}@{domain}"

# Use in queries
customers = con.table("customers")
deduped = (
    customers
    .mutate(clean_email=normalize_email(_.email))
    .group_by(_.clean_email)
    .agg(
        count=_.count(),
        first_seen=_.created_at.min(),
    )
    .filter(_.count > 1)
)

Installation

# Core library
pip install ibis-framework

# With specific backends
pip install "ibis-framework[duckdb]"
pip install "ibis-framework[postgres]"
pip install "ibis-framework[bigquery]"
pip install "ibis-framework[snowflake]"
pip install "ibis-framework[pyspark]"

# Interactive mode (for notebooks)
ibis.options.interactive = True   # Auto-execute and display results

Examples

Example 1: Migrating a pandas pipeline to run on BigQuery

User request:

I have a pandas script that calculates cohort retention from our events table. Rewrite it using Ibis so it runs on BigQuery instead of loading everything into memory.

The agent rewrites the pandas code using Ibis expressions (ibis.bigquery.connect(), t.group_by(), _.mutate(), window functions with ibis.cumulative_window()), keeping the same logic but generating SQL that executes on BigQuery. The script goes from loading 50M rows into memory to pushing all computation to the warehouse.

Example 2: Building a portable analytics module with DuckDB for dev

User request:

Write an analytics module that computes daily active users and revenue per plan from Parquet files locally, but can switch to Snowflake in production.

The agent creates a module using ibis.duckdb.connect() for local development with Parquet files, writes composable Ibis expressions for DAU (t.select('user_id', 'event_date').distinct().group_by('event_date').count()) and revenue by plan, and adds a get_connection() function that switches to ibis.snowflake.connect() based on an environment variable — same analytics code, different backend.

Guidelines

  1. Write once, run anywhere — Define analytics logic with Ibis; swap backends by changing the connection, not the code
  2. Lazy by default — Ibis expressions are lazy; they only execute when you call .execute() or .to_pandas()
  3. DuckDB for development — Use DuckDB locally with Parquet files; switch to BigQuery/Snowflake for production
  4. Use _ for readabilityfrom ibis import _ gives you clean column references: _.amount.sum() vs orders.amount.sum()
  5. Generate SQL for debugging — Use ibis.to_sql(expr) to see the SQL being generated; helps debug unexpected results
  6. Functions for reuse — Wrap analytics logic in functions that take a connection; test with DuckDB, deploy on any backend
  7. Interactive mode in notebooks — Set ibis.options.interactive = True for immediate result display during exploration
  8. Type your schemas — Use ibis.schema() to define expected table schemas; catch type mismatches early

> related_skills --same-repo

> zustand

You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.

> zoho

Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.

> zod

You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.

> zipkin

Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.

┌ stats

installs/wk0
░░░░░░░░░░
github stars17
███░░░░░░░
first seenMar 17, 2026
└────────────

┌ repo

TerminalSkills/skills
by TerminalSkills
└────────────

┌ tags

└────────────