> clickhouse-cost-tuning
Optimize ClickHouse Cloud costs — compute scaling, storage tiering, compression, and query efficiency for lower bills. Use when analyzing ClickHouse Cloud bills, reducing storage costs, or optimizing compute utilization. Trigger: "clickhouse cost", "clickhouse billing", "reduce clickhouse spend", "clickhouse pricing", "clickhouse expensive", "clickhouse storage cost".
curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/clickhouse-cost-tuning?format=md"ClickHouse Cost Tuning
Overview
Reduce ClickHouse Cloud costs through storage optimization, compression tuning, TTL policies, compute scaling, and query efficiency improvements.
Prerequisites
- ClickHouse Cloud account with billing access
- Understanding of current data volumes and query patterns
Instructions
Step 1: Understand ClickHouse Cloud Pricing
| Component | Pricing Model | Key Driver |
|---|---|---|
| Compute | Per-hour per replica | vCPU + memory tier |
| Storage | Per GB-month | Compressed data on disk |
| Network | Per GB egress | Query result sizes |
| Backups | Per GB stored | Backup retention |
Key insight: ClickHouse bills on compressed storage, and ClickHouse compresses extremely well (often 10-20x). Your cost driver is usually compute, not storage.
Step 2: Analyze Storage Usage
-- Storage cost breakdown by table
SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS raw_size,
round(sum(data_uncompressed_bytes) / sum(bytes_on_disk), 1) AS compression_ratio,
sum(rows) AS total_rows,
count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;
-- Storage by column (find bloated columns)
SELECT
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS raw,
round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics'
GROUP BY table, column, type
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 30;
Step 3: Improve Compression
-- Check current codec per column
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'analytics' AND table = 'events';
-- Apply better codecs to large columns
ALTER TABLE analytics.events
MODIFY COLUMN properties String CODEC(ZSTD(3)); -- JSON blobs
ALTER TABLE analytics.events
MODIFY COLUMN created_at DateTime CODEC(DoubleDelta, ZSTD); -- Timestamps
ALTER TABLE analytics.events
MODIFY COLUMN user_id UInt64 CODEC(Delta, ZSTD); -- Sequential IDs
-- Verify improvement after next merge
OPTIMIZE TABLE analytics.events FINAL;
-- Check new compression ratio
SELECT
column,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics' AND table = 'events'
GROUP BY column ORDER BY sum(column_data_compressed_bytes) DESC;
Step 4: TTL for Data Lifecycle
-- Expire old data automatically (reduces storage)
ALTER TABLE analytics.events
MODIFY TTL created_at + INTERVAL 90 DAY;
-- Move old data to cheaper storage tier (ClickHouse Cloud)
ALTER TABLE analytics.events
MODIFY TTL
created_at + INTERVAL 30 DAY TO VOLUME 'hot',
created_at + INTERVAL 90 DAY TO VOLUME 'cold',
created_at + INTERVAL 365 DAY DELETE;
-- Drop entire partitions manually (fastest way to delete bulk data)
ALTER TABLE analytics.events
DROP PARTITION '202401'; -- Drops January 2024
-- Check TTL status
SELECT database, table, result_ttl_expression
FROM system.tables
WHERE database = 'analytics';
Step 5: Compute Cost Reduction
-- ClickHouse Cloud: Scale compute dynamically
-- Configure in Cloud Console:
-- - Auto-scaling: min 2 / max 8 replicas
-- - Idle timeout: 5 minutes (auto-suspend when no queries)
-- - Use "Development" tier for staging environments
-- Reduce per-query compute consumption
SET max_threads = 4; -- Use fewer cores per query
SET max_memory_usage = 5000000000; -- 5GB cap per query
-- Server-side async inserts (reduces insert compute)
SET async_insert = 1;
SET async_insert_max_data_size = 10000000; -- Flush at 10MB
SET async_insert_busy_timeout_ms = 5000; -- or every 5 seconds
Step 6: Query Efficiency = Lower Costs
-- Find the most expensive queries (by data scanned)
SELECT
normalized_query_hash,
count() AS executions,
formatReadableSize(sum(read_bytes)) AS total_read,
round(avg(query_duration_ms)) AS avg_ms,
any(substring(query, 1, 200)) AS sample
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - INTERVAL 7 DAY
GROUP BY normalized_query_hash
ORDER BY sum(read_bytes) DESC
LIMIT 20;
-- Use materialized views to avoid repeated full scans
-- Instead of: SELECT count() FROM events WHERE date = today()
-- Pre-compute:
-- CREATE MATERIALIZED VIEW daily_counts_mv TO daily_counts AS
-- SELECT toDate(created_at) AS date, count() AS cnt FROM events GROUP BY date;
-- Then: SELECT cnt FROM daily_counts WHERE date = today()
-- Use PREWHERE to read less data
SELECT user_id, properties FROM analytics.events
PREWHERE event_type = 'purchase' -- Filter first, read fewer columns
WHERE created_at >= today() - 7;
Step 7: Monitor Costs
// Track query costs in your application
async function queryWithCostTracking<T>(
client: ReturnType<typeof import('@clickhouse/client').createClient>,
sql: string,
): Promise<{ rows: T[]; cost: { readRows: number; readBytes: number; durationMs: number } }> {
const start = Date.now();
const rs = await client.query({ query: sql, format: 'JSONEachRow' });
const rows = await rs.json<T>();
const durationMs = Date.now() - start;
// Log for cost analysis
console.log({
query: sql.slice(0, 100),
readRows: rs.response_headers['x-clickhouse-summary']
? JSON.parse(rs.response_headers['x-clickhouse-summary']).read_rows
: 'unknown',
durationMs,
});
return { rows, cost: { readRows: 0, readBytes: 0, durationMs } };
}
Cost Optimization Checklist
- Compression codecs applied to large columns (ZSTD, Delta, DoubleDelta)
- TTL configured for data expiration
- Auto-scaling and idle suspension enabled (Cloud)
- Development/staging on smaller tiers
- Materialized views for dashboard queries
-
max_threadslimited for non-critical queries -
async_insertenabled for high-frequency small inserts - Monthly cost review with
system.query_loganalysis
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Storage growing fast | No TTL, no drops | Add TTL or schedule partition drops |
| High compute bill | Full-scan queries | Add materialized views, fix ORDER BY |
| Egress charges | Large result sets | Add LIMIT, use aggregations |
| Idle compute cost | No auto-suspend | Enable idle timeout in Cloud console |
Resources
Next Steps
For architecture patterns, see clickhouse-reference-architecture.
> related_skills --same-repo
> fathom-cost-tuning
Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".
> fathom-core-workflow-b
Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".
> fathom-core-workflow-a
Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".
> fathom-common-errors
Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".