> firecrawl
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
curl "https://skillshub.wtf/TerminalSkills/skills/firecrawl?format=md"Firecrawl
Overview
Firecrawl is an API that scrapes websites and returns clean, LLM-ready content. Point it at any URL and get back markdown, HTML, or structured data — no selectors to write, no anti-bot handling, no browser management. It handles JavaScript rendering, proxy rotation, and content extraction automatically. Built for feeding web content into LLMs, RAG pipelines, and data workflows.
When to Use
- Extracting website content for RAG (Retrieval-Augmented Generation)
- Converting web pages to clean markdown for LLM consumption
- Crawling entire sites and getting structured content
- Scraping without managing browsers, proxies, or anti-bot
- Extracting structured data (products, articles) with LLM-powered extraction
Instructions
Setup
npm install @mendable/firecrawl-js
# Or Python: pip install firecrawl-py
# Self-hosted: docker run -p 3002:3002 mendableai/firecrawl
Single Page Scrape
// scrape.ts — Convert any URL to clean markdown
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY,
// apiUrl: "http://localhost:3002" // For self-hosted
});
// Scrape a single page
const result = await firecrawl.scrapeUrl("https://docs.example.com/getting-started", {
formats: ["markdown", "html"], // Get both formats
});
console.log(result.markdown); // Clean markdown content
console.log(result.metadata); // Title, description, language, etc.
Full Site Crawl
// crawl.ts — Crawl an entire site
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
limit: 100, // Max pages to crawl
scrapeOptions: {
formats: ["markdown"],
},
});
// Process all pages
for (const page of crawlResult.data) {
console.log(`${page.metadata.title}: ${page.markdown.length} chars`);
// Feed into your RAG pipeline, vector DB, etc.
}
Structured Data Extraction
// extract.ts — Extract structured data using LLM
import { z } from "zod";
const ProductSchema = z.object({
name: z.string(),
price: z.number(),
currency: z.string(),
rating: z.number().optional(),
inStock: z.boolean(),
features: z.array(z.string()),
});
const result = await firecrawl.scrapeUrl("https://shop.example.com/product/123", {
formats: ["extract"],
extract: {
schema: ProductSchema,
},
});
console.log(result.extract);
// { name: "Widget Pro", price: 49.99, currency: "USD", rating: 4.5, inStock: true, features: [...] }
Build a RAG Knowledge Base
// rag-ingest.ts — Crawl docs site and ingest into vector DB
import FirecrawlApp from "@mendable/firecrawl-js";
import { ChromaClient } from "chromadb";
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const chroma = new ChromaClient();
const collection = await chroma.getOrCreateCollection({ name: "docs" });
// Crawl documentation site
const crawl = await firecrawl.crawlUrl("https://docs.myproduct.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
// Chunk and store in vector DB
for (const page of crawl.data) {
const chunks = splitIntoChunks(page.markdown, 1000); // 1000 char chunks
await collection.add({
ids: chunks.map((_, i) => `${page.metadata.sourceURL}-chunk-${i}`),
documents: chunks,
metadatas: chunks.map(() => ({
source: page.metadata.sourceURL,
title: page.metadata.title,
})),
});
}
function splitIntoChunks(text: string, size: number): string[] {
const chunks: string[] = [];
for (let i = 0; i < text.length; i += size) {
chunks.push(text.slice(i, i + size));
}
return chunks;
}
Examples
Example 1: Build a docs chatbot
User prompt: "I want a chatbot that answers questions about my product documentation."
The agent will use Firecrawl to crawl the docs site, convert to markdown, chunk the content, store in a vector database, and build a RAG query pipeline.
Example 2: Monitor competitor content changes
User prompt: "Track when our competitor updates their pricing page."
The agent will schedule periodic Firecrawl scrapes, compare markdown diffs between runs, and alert on significant changes.
Guidelines
scrapeUrlfor single pages — fast, returns markdown + metadatacrawlUrlfor entire sites — follows links, respects limits- Markdown is the best LLM format — cleaner than HTML, preserves structure
- Structured extraction for data — use Zod/JSON schema to extract typed data
- Self-host for privacy —
docker run mendableai/firecrawlfor sensitive data - Rate limits on cloud API — 500 pages/min on free tier
- Chunk markdown for RAG — 500-1500 char chunks with overlap work best
- Cache results — don't re-scrape unchanged pages
formatsarray — request only what you need (markdown, html, extract)
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.