found 287 skills in registry
Monitor, trace, debug, and evaluate LLM applications with LangSmith. Use when a user asks to trace LLM calls, debug chain executions, evaluate AI output quality, set up LLM observability, monitor agent performance, run prompt experiments, compare model outputs, create evaluation datasets, track token usage and latency, or build LLM testing pipelines. Covers tracing, datasets, evaluators, annotation queues, prompt hub, and production monitoring.
Assists with finding security vulnerabilities in web applications using OWASP ZAP. Use when configuring automated scans, writing scan policies, integrating security scanning into CI/CD pipelines, or analyzing results for OWASP Top 10 vulnerabilities like XSS, SQL injection, and CSRF. Trigger words: owasp zap, security scan, vulnerability scanner, penetration testing, zap-baseline, active scan, passive scan.
Automatically test APIs by generating test cases from OpenAPI/GraphQL schemas. Use when tasks involve API fuzzing, finding edge cases in REST or GraphQL APIs, testing schema compliance, generating property-based tests from API specs, finding crashes and 500 errors, or validating API contracts. Schemathesis generates thousands of test cases from your schema and finds bugs that manual testing misses.
Run AI-powered penetration testing with PentAGI. Use when a user asks to automate security testing, set up autonomous pentesting, deploy an AI-driven vulnerability scanner, build a self-hosted security testing platform, or conduct penetration tests with LLM-powered agents.
You are an expert in Braintrust, the evaluation and observability platform for AI applications. You help developers run systematic evaluations, compare model versions, track experiments, log production traces, and measure quality metrics — with a focus on making AI development as rigorous as traditional software testing.
Write and maintain end-to-end tests with Playwright. Use when someone asks to "add e2e tests", "test my web app", "set up Playwright", "write browser tests", "test login flow", "visual regression testing", "test across browsers", or "automate UI testing". Covers test setup, page objects, authentication, API mocking, visual comparisons, and CI integration.
Test web application security with Burp Suite. Use when a user asks to intercept HTTP traffic, test for web vulnerabilities, fuzz API endpoints, analyze authentication flows, or perform manual web application pentesting.
Run DOM tests without a browser using Happy DOM — fast JavaScript DOM implementation. Use when someone asks to "test without a browser", "Happy DOM", "fast DOM environment for tests", "replace jsdom", "Vitest DOM environment", or "server-side DOM". Covers test environment setup, Vitest/Jest integration, SSR testing, and performance comparison with jsdom.
When the user wants to implement consumer-driven contract testing between microservices using Pact. Also use when the user mentions "pact," "contract testing," "consumer-driven contracts," "CDC testing," "provider verification," or "Pact Broker." For API mocking, see mockoon or wiremock.
Brute force directories, files, DNS subdomains, and virtual hosts with Gobuster. Use when a user asks to discover hidden endpoints, enumerate subdomains, find backup files, or perform web content discovery during penetration testing.
Great Expectations is a Python framework for data quality testing and validation. Learn to define expectations, create validation suites, build data docs, and integrate with data pipelines for automated quality checks.
Design and execute growth experiments for rapid user acquisition and retention. Use when tasks involve viral loops, referral programs, conversion funnel optimization, A/B testing strategies, CAC/LTV analysis, product-led growth, activation rate improvement, cohort analysis, or scaling user growth through data-driven experimentation. Covers the full growth lifecycle from acquisition through retention.
When the user wants to write mobile UI tests using Maestro's simple YAML-based flow definitions. Also use when the user mentions "maestro," "mobile UI testing," "YAML mobile tests," "maestro flows," or "maestro studio." For React Native-specific gray-box testing, see detox. For cross-platform mobile automation, see appium.
When the user wants to write behavior-driven development (BDD) tests using Gherkin syntax and Cucumber step definitions. Also use when the user mentions "cucumber," "BDD," "Gherkin," "feature files," "given-when-then," "step definitions," or "behavior-driven." For contract testing, see pact.
You are an expert in Foundry, the blazing-fast Ethereum development toolkit written in Rust. You help developers write, test, deploy, and debug Solidity smart contracts using Forge (testing), Cast (CLI interactions), Anvil (local node), and Chisel (Solidity REPL) — with native Solidity testing (no JavaScript), fuzz testing, gas optimization, and fork testing against mainnet state.
When the user wants to perform visual regression testing with Storybook integration using Chromatic. Also use when the user mentions "chromatic," "visual regression," "Storybook testing," "UI review," "visual diff," or "component snapshot testing." For general screenshot comparison, see percy.
When the user wants to perform load testing using Python with Locust's distributed architecture and real-time web UI. Also use when the user mentions "locust," "Python load testing," "distributed load test," "locust web UI," or "locustfile." For JavaScript-based load testing, see k6 or artillery.
Expert guidance for DeepEval, the open-source framework for unit testing LLM applications. Helps developers write test cases, define custom metrics, and integrate LLM quality checks into CI/CD pipelines using a pytest-like interface.
When the user wants to write end-to-end tests for React Native apps using Detox's gray-box testing approach. Also use when the user mentions "detox," "React Native testing," "React Native E2E," "gray-box testing," or "Wix Detox." For general mobile testing, see appium. For simpler mobile UI flows, see maestro.
Test and debug Stripe payment integrations. Use when someone needs to verify webhook handling, simulate payment flows, debug failed charges, validate subscription lifecycle, or troubleshoot Stripe API errors. Trigger words: stripe, payment testing, webhook debugging, charge failed, subscription error, payment intent, checkout session.