> senior-computer-vision
Object detection, image segmentation, and visual AI. Use when a user asks to detect objects in images, segment images, run YOLO inference, build a computer vision pipeline, use Faster R-CNN or SAM, deploy a vision model with TensorRT, or perform image classification. Covers YOLO, SAM, Mask R-CNN, and TensorRT optimization.
curl "https://skillshub.wtf/TerminalSkills/skills/senior-computer-vision?format=md"Senior Computer Vision
Overview
Build and deploy computer vision pipelines for object detection, image segmentation, and visual AI. Supports YOLO (v8/v11), Faster R-CNN, SAM (Segment Anything Model), and Mask R-CNN. Includes TensorRT optimization for production deployment with real-time inference.
Instructions
When a user asks for computer vision help, determine the task:
Task A: Object detection with YOLO
- Install ultralytics:
pip install ultralytics
- Run inference:
from ultralytics import YOLO
# Load a pretrained model
model = YOLO("yolo11n.pt") # nano (fastest) | s | m | l | x (most accurate)
# Detect objects in an image
results = model("image.jpg")
# Process results
for result in results:
boxes = result.boxes
for box in boxes:
cls = int(box.cls[0])
conf = float(box.conf[0])
label = model.names[cls]
x1, y1, x2, y2 = box.xyxy[0].tolist()
print(f"{label}: {conf:.2f} at [{x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f}]")
# Save annotated image
results[0].save("result.jpg")
- Run on video:
# Process video with tracking
results = model.track("video.mp4", show=False, save=True, tracker="bytetrack.yaml")
- Train a custom YOLO model:
model = YOLO("yolo11n.pt")
model.train(
data="dataset.yaml", # Path to dataset config
epochs=100,
imgsz=640,
batch=16,
device=0, # GPU index
patience=20, # Early stopping
)
Dataset YAML format:
path: ./dataset
train: images/train
val: images/val
names:
0: cat
1: dog
2: bird
Task B: Image segmentation with SAM
from segment_anything import sam_model_registry, SamPredictor, SamAutomaticMaskGenerator
# Load SAM model
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to(device="cuda")
# Point-based segmentation
predictor = SamPredictor(sam)
predictor.set_image(image) # numpy array (H, W, 3)
# Segment with a point prompt
masks, scores, logits = predictor.predict(
point_coords=np.array([[500, 375]]), # (x, y) coordinates
point_labels=np.array([1]), # 1 = foreground, 0 = background
multimask_output=True,
)
# Automatic mask generation (segment everything)
mask_generator = SamAutomaticMaskGenerator(sam)
masks = mask_generator.generate(image)
# Each mask: {segmentation, area, bbox, predicted_iou, stability_score}
print(f"Found {len(masks)} segments")
Task C: Faster R-CNN and Mask R-CNN with torchvision
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, maskrcnn_resnet50_fpn_v2
from torchvision.transforms import functional as F
from PIL import Image
# Object detection with Faster R-CNN
det_model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT")
det_model.eval().cuda()
img = Image.open("image.jpg")
img_tensor = F.to_tensor(img).unsqueeze(0).cuda()
with torch.no_grad():
predictions = det_model(img_tensor)[0]
# Filter by confidence
threshold = 0.7
for i in range(len(predictions["scores"])):
if predictions["scores"][i] > threshold:
label = predictions["labels"][i].item()
score = predictions["scores"][i].item()
box = predictions["boxes"][i].tolist()
print(f"Class {label}: {score:.2f} at {box}")
# Instance segmentation with Mask R-CNN
seg_model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT")
seg_model.eval().cuda()
with torch.no_grad():
predictions = seg_model(img_tensor)[0]
# predictions["masks"] contains per-instance binary masks
Task D: TensorRT optimization for deployment
# Export YOLO to TensorRT
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.export(format="engine", device=0, half=True) # Creates yolo11n.engine
# Run inference with TensorRT engine
trt_model = YOLO("yolo11n.engine")
results = trt_model("image.jpg")
For custom models:
import tensorrt as trt
import torch
# Export PyTorch model to ONNX first
torch.onnx.export(
model, dummy_input, "model.onnx",
opset_version=17,
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
)
# Convert ONNX to TensorRT
# trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
Task E: Image classification
from torchvision.models import efficientnet_v2_s, EfficientNet_V2_S_Weights
from torchvision.transforms import functional as F
from PIL import Image
weights = EfficientNet_V2_S_Weights.DEFAULT
model = efficientnet_v2_s(weights=weights).eval().cuda()
preprocess = weights.transforms()
img = Image.open("image.jpg")
batch = preprocess(img).unsqueeze(0).cuda()
with torch.no_grad():
logits = model(batch)
probs = torch.softmax(logits, dim=1)[0]
top5 = torch.topk(probs, 5)
categories = weights.meta["categories"]
for score, idx in zip(top5.values, top5.indices):
print(f"{categories[idx]}: {score:.2%}")
Examples
Example 1: Count products on a shelf
User request: "Count how many bottles are on each shelf in this image"
model = YOLO("yolo11m.pt")
results = model("shelf.jpg", conf=0.5)
bottles = [b for b in results[0].boxes if model.names[int(b.cls[0])] == "bottle"]
print(f"Detected {len(bottles)} bottles")
results[0].save("shelf_annotated.jpg")
Example 2: Segment and extract a foreground object
User request: "Remove the background from this product photo"
Use SAM with a center-point prompt to segment the main object, then apply the mask to create a transparent PNG background.
Example 3: Real-time detection on a webcam
User request: "Run object detection on my webcam feed"
model = YOLO("yolo11n.pt")
results = model(source=0, show=True, conf=0.5) # source=0 for webcam
Guidelines
- Start with the smallest model variant (nano/small) and scale up only if accuracy is insufficient.
- Use TensorRT or ONNX Runtime for production deployments; they provide 2-5x speedup over PyTorch.
- For custom detection tasks, fine-tune YOLO on your dataset rather than training from scratch.
- Set confidence thresholds based on the application: 0.5 for general use, 0.7+ for high-precision needs.
- Use half-precision (FP16) inference on GPUs for nearly 2x speedup with minimal accuracy loss.
- Pre-process images to the model's expected resolution before inference for best results.
- For video processing, use batch inference and tracking (ByteTrack) for temporal consistency.
- Benchmark inference speed with
model.benchmark()(YOLO) before committing to a model size.
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.