> firecrawl

Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.

fetch

$curl "https://skillshub.wtf/TerminalSkills/skills/firecrawl?format=md"

SKILL.md•firecrawl

Firecrawl

Overview

Firecrawl is an API that scrapes websites and returns clean, LLM-ready content. Point it at any URL and get back markdown, HTML, or structured data — no selectors to write, no anti-bot handling, no browser management. It handles JavaScript rendering, proxy rotation, and content extraction automatically. Built for feeding web content into LLMs, RAG pipelines, and data workflows.

When to Use

Extracting website content for RAG (Retrieval-Augmented Generation)
Converting web pages to clean markdown for LLM consumption
Crawling entire sites and getting structured content
Scraping without managing browsers, proxies, or anti-bot
Extracting structured data (products, articles) with LLM-powered extraction

Instructions

Setup

npm install @mendable/firecrawl-js
# Or Python: pip install firecrawl-py

# Self-hosted: docker run -p 3002:3002 mendableai/firecrawl

Single Page Scrape

// scrape.ts — Convert any URL to clean markdown
import FirecrawlApp from "@mendable/firecrawl-js";

const firecrawl = new FirecrawlApp({
  apiKey: process.env.FIRECRAWL_API_KEY,
  // apiUrl: "http://localhost:3002" // For self-hosted
});

// Scrape a single page
const result = await firecrawl.scrapeUrl("https://docs.example.com/getting-started", {
  formats: ["markdown", "html"],  // Get both formats
});

console.log(result.markdown);     // Clean markdown content
console.log(result.metadata);     // Title, description, language, etc.

Full Site Crawl

// crawl.ts — Crawl an entire site
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
  limit: 100,                     // Max pages to crawl
  scrapeOptions: {
    formats: ["markdown"],
  },
});

// Process all pages
for (const page of crawlResult.data) {
  console.log(`${page.metadata.title}: ${page.markdown.length} chars`);
  // Feed into your RAG pipeline, vector DB, etc.
}

Structured Data Extraction

// extract.ts — Extract structured data using LLM
import { z } from "zod";

const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  currency: z.string(),
  rating: z.number().optional(),
  inStock: z.boolean(),
  features: z.array(z.string()),
});

const result = await firecrawl.scrapeUrl("https://shop.example.com/product/123", {
  formats: ["extract"],
  extract: {
    schema: ProductSchema,
  },
});

console.log(result.extract);
// { name: "Widget Pro", price: 49.99, currency: "USD", rating: 4.5, inStock: true, features: [...] }

Build a RAG Knowledge Base

// rag-ingest.ts — Crawl docs site and ingest into vector DB
import FirecrawlApp from "@mendable/firecrawl-js";
import { ChromaClient } from "chromadb";

const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const chroma = new ChromaClient();
const collection = await chroma.getOrCreateCollection({ name: "docs" });

// Crawl documentation site
const crawl = await firecrawl.crawlUrl("https://docs.myproduct.com", {
  limit: 500,
  scrapeOptions: { formats: ["markdown"] },
});

// Chunk and store in vector DB
for (const page of crawl.data) {
  const chunks = splitIntoChunks(page.markdown, 1000);  // 1000 char chunks

  await collection.add({
    ids: chunks.map((_, i) => `${page.metadata.sourceURL}-chunk-${i}`),
    documents: chunks,
    metadatas: chunks.map(() => ({
      source: page.metadata.sourceURL,
      title: page.metadata.title,
    })),
  });
}

function splitIntoChunks(text: string, size: number): string[] {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += size) {
    chunks.push(text.slice(i, i + size));
  }
  return chunks;
}

Examples

Example 1: Build a docs chatbot

User prompt: "I want a chatbot that answers questions about my product documentation."

The agent will use Firecrawl to crawl the docs site, convert to markdown, chunk the content, store in a vector database, and build a RAG query pipeline.

Example 2: Monitor competitor content changes

User prompt: "Track when our competitor updates their pricing page."

The agent will schedule periodic Firecrawl scrapes, compare markdown diffs between runs, and alert on significant changes.

Guidelines

scrapeUrl for single pages — fast, returns markdown + metadata
crawlUrl for entire sites — follows links, respects limits
Markdown is the best LLM format — cleaner than HTML, preserves structure
Structured extraction for data — use Zod/JSON schema to extract typed data
Self-host for privacy — docker run mendableai/firecrawl for sensitive data
Rate limits on cloud API — 500 pages/min on free tier
Chunk markdown for RAG — 500-1500 char chunks with overlap work best
Cache results — don't re-scrape unchanged pages
formats array — request only what you need (markdown, html, extract)

> related_skills --same-repo

> zustand

You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.

> zod

You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.

> xero-accounting

Integrate with the Xero accounting API to sync invoices, expenses, bank transactions, and contacts — and generate financial reports like P&L and balance sheet. Use when: connecting apps to Xero, automating bookkeeping workflows, syncing accounting data, or pulling financial reports programmatically.

> windsurf-rules

Configure Windsurf AI coding assistant with .windsurfrules and workspace rules. Use when: customizing Windsurf for a project, setting AI coding standards, creating team-shared Windsurf configurations, or tuning Cascade AI behavior.

┌ stats

installs/wk0

░░░░░░░░░░

github stars76

██████████

first seenMar 17, 2026

└────────────

┌ repo

TerminalSkills/skills

by TerminalSkills

└────────────

┌ tags

#ai #scraping

└────────────