> llamaindex
Assists with building RAG pipelines, knowledge assistants, and data-augmented LLM applications using LlamaIndex. Use when ingesting documents, configuring retrieval strategies, building query engines, or creating multi-step agents. Trigger words: llamaindex, rag, retrieval augmented generation, vector index, query engine, document loader, knowledge base.
curl "https://skillshub.wtf/TerminalSkills/skills/llamaindex?format=md"LlamaIndex
Overview
LlamaIndex is a data framework for building RAG pipelines, knowledge assistants, and data-augmented LLM applications. It provides document loading from 300+ sources, flexible chunking strategies, multiple index types, hybrid retrieval with reranking, and production evaluation tools for question-answering systems.
Instructions
- When ingesting documents, use
SimpleDirectoryReaderfor local files or LlamaHub connectors for SaaS platforms, and run through anIngestionPipelinewith metadata extractors (title, summary) and deduplication. - When chunking, start with
SentenceSplitterat 1024 tokens with 200 token overlap, useMarkdownNodeParserfor structured documents,CodeSplitterfor code, and adjust based on evaluation results. - When indexing, use
VectorStoreIndexas the default for most RAG,KnowledgeGraphIndexfor entity relationships, andDocumentSummaryIndexfor per-document summaries. - When retrieving, implement hybrid retrieval (vector + keyword) for production, add a reranker (
CohereRerank) after retrieval for improved relevance, and setsimilarity_top_kbased on context window (3-5 for large models, 2-3 for smaller). - When building query engines, use
RetrieverQueryEnginefor standard RAG,CitationQueryEnginefor responses with source attribution, andSubQuestionQueryEnginefor complex multi-part queries. - When creating agents, use
ReActAgentwith tools wrapping query engines (QueryEngineTool), functions, and other agents for multi-step reasoning. - When evaluating, use
CorrectnessEvaluator,FaithfulnessEvaluator, andRelevancyEvaluatoron a test set before deploying.
Examples
Example 1: Build a RAG pipeline over company documentation
User request: "Create a question-answering system over our internal docs"
Actions:
- Load documents with
SimpleDirectoryReaderand extract metadata (title, summary) - Chunk with
SentenceSplitter(1024 tokens, 200 overlap) through anIngestionPipeline - Create
VectorStoreIndexwith OpenAI embeddings and configure hybrid retrieval - Build
CitationQueryEnginefor answers with source references
Output: A RAG system that answers questions with citations from company documentation.
Example 2: Create a multi-source research agent
User request: "Build an agent that can search across our docs, database, and web"
Actions:
- Create separate query engines for each data source (vector index, SQL, web search)
- Wrap each engine as a
QueryEngineToolwith descriptive tool descriptions - Build a
ReActAgentthat routes questions to the appropriate tool - Add
SubQuestionQueryEnginefor complex queries requiring multiple sources
Output: An intelligent agent that reasons about which data source to query and synthesizes multi-source answers.
Guidelines
- Use
SentenceSplitterwith 1024 token chunks and 200 token overlap as the starting point. - Always add metadata extractors to the ingestion pipeline; title and summary metadata improve retrieval significantly.
- Use hybrid retrieval (vector + keyword) for production; pure vector search misses exact term matches.
- Add a reranker (
CohereRerank) after retrieval to improve result relevance for small cost. - Evaluate with
CorrectnessEvaluatoron a test set before deploying; subjective quality assessment does not scale. - Set
similarity_top_kbased on context window: 3-5 chunks for large models, 2-3 for smaller models. - Use
IngestionPipelinewith deduplication for incremental data updates; do not re-embed unchanged documents.
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.