> scikit-learn
Assists with building, evaluating, and deploying machine learning models using scikit-learn. Use when performing data preprocessing, feature engineering, model selection, hyperparameter tuning, cross-validation, or building pipelines for classification, regression, and clustering tasks. Trigger words: sklearn, scikit-learn, machine learning, classification, regression, pipeline, cross-validation.
curl "https://skillshub.wtf/TerminalSkills/skills/scikit-learn?format=md"scikit-learn
Overview
Scikit-learn is a Python machine learning library that provides a consistent API for the full ML workflow: data preprocessing (scaling, encoding, imputation), model selection (classification, regression, clustering), hyperparameter tuning (grid search, randomized search), cross-validation, and pipeline construction. It supports serialization via joblib for production deployment.
Instructions
- When preprocessing data, use
ColumnTransformerto apply different transformers to numeric and categorical columns (StandardScaler, OneHotEncoder, SimpleImputer), always within a Pipeline to prevent data leakage. - When choosing models, start with fast baselines (LogisticRegression, RandomForest) and use
HistGradientBoostingClassifierfor best tabular performance, since it handles missing values natively and is faster than GradientBoosting. - When evaluating, use
cross_val_scorewith 5-fold CV instead of single train/test splits, and useclassification_report()instead of accuracy alone since accuracy is misleading on imbalanced datasets. - When tuning hyperparameters, use
RandomizedSearchCVwhen the search space exceeds 100 combinations (faster than exhaustive GridSearchCV), and useStratifiedKFoldorTimeSeriesSplitas appropriate. - When building pipelines, chain preprocessing and model steps with
Pipelineto ensure transformers fit only on training data, then serialize the full pipeline withjoblib.dump()for deployment. - When selecting features, use
permutation_importance()for model-agnostic measurement,SelectKBestfor statistical filtering, orfeature_importances_from tree-based models.
Examples
Example 1: Build a customer churn prediction pipeline
User request: "Create a model to predict which customers will churn"
Actions:
- Build a
ColumnTransformerwithStandardScalerfor numeric features andOneHotEncoderfor categorical - Create a
Pipelinewith the transformer andHistGradientBoostingClassifier - Tune hyperparameters with
RandomizedSearchCVusingStratifiedKFold - Evaluate with
classification_report()focusing on recall for the churn class
Output: A tuned churn prediction pipeline with preprocessing, model, and evaluation metrics.
Example 2: Cluster customers into segments
User request: "Segment customers based on purchasing behavior"
Actions:
- Preprocess features with
StandardScalerin a pipeline - Use
KMeanswith silhouette score analysis to determine optimal cluster count - Run
PCAfor dimensionality reduction and visualization - Profile clusters with
groupbyon original features to interpret segments
Output: Customer segments with labeled profiles and a visual cluster map.
Guidelines
- Always use
Pipelineto prevent data leakage by fitting transformers only on training data. - Use
ColumnTransformerfor mixed data types: numeric scaling and categorical encoding in one object. - Use
HistGradientBoostingClassifieroverGradientBoostingClassifiersince it is faster and handles missing values natively. - Use
cross_val_scorewith 5-fold CV rather than a single train/test split since single splits are noisy. - Use
RandomizedSearchCVwhen the search space exceeds 100 combinations. - Use
classification_report()not just accuracy, which is misleading on imbalanced datasets. - Serialize the full pipeline with
joblib, not just the model, since deployment needs preprocessing too.
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.