> idea-discovery
Workflow 1: Full idea discovery pipeline. Orchestrates research-lit → idea-creator → novelty-check → research-review to go from a broad research direction to validated, pilot-tested ideas. Use when user says \"找idea全流程\", \"idea discovery pipeline\", \"从零开始找方向\", or wants the complete idea exploration workflow.
curl "https://skillshub.wtf/wanshuiyin/Auto-claude-code-research-in-sleep/idea-discovery?format=md"Workflow 1: Idea Discovery Pipeline
Orchestrate a complete idea discovery workflow for: $ARGUMENTS
Overview
This skill chains sub-skills into a single automated pipeline:
/research-lit → /idea-creator → /novelty-check → /research-review → /research-refine-pipeline
(survey) (brainstorm) (verify novel) (critical feedback) (refine method + plan experiments)
Each phase builds on the previous one's output. The final deliverables are a validated IDEA_REPORT.md with ranked ideas, plus a refined proposal (refine-logs/FINAL_PROPOSAL.md) and experiment plan (refine-logs/EXPERIMENT_PLAN.md) for the top idea.
Constants
- PILOT_MAX_HOURS = 2 — Skip any pilot experiment estimated to take > 2 hours per GPU. Flag as "needs manual pilot" in the report.
- PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill any running pilot that exceeds 3 hours. Collect partial results if available.
- MAX_PILOT_IDEAS = 3 — Run pilots for at most 3 top ideas in parallel. Additional ideas are validated on paper only.
- MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget across all pilots. If exceeded, skip remaining pilots and note in report.
- AUTO_PROCEED = true — If user doesn't respond at a checkpoint, automatically proceed with the best option after presenting results. Set to
falseto always wait for explicit user confirmation. - REVIEWER_MODEL =
gpt-5.4— Model used via a secondary Codex agent. Must be an OpenAI model (e.g.,gpt-5.4,o3,gpt-4o). Passed to sub-skills. - ARXIV_DOWNLOAD = false — When
true,/research-litdownloads the top relevant arXiv PDFs during Phase 1. Whenfalse(default), only fetches metadata. Passed through to/research-lit.
💡 These are defaults. Override by telling the skill, e.g.,
/idea-discovery "topic" — pilot budget: 4h per idea, 20h totalor/idea-discovery "topic" — arxiv download: true.
Pipeline
Phase 1: Literature Survey
Invoke /research-lit to map the research landscape:
/research-lit "$ARGUMENTS"
What this does:
- Search arXiv, Google Scholar, Semantic Scholar for recent papers
- Build a landscape map: sub-directions, approaches, open problems
- Identify structural gaps and recurring limitations
- Output a literature summary (saved to working notes)
🚦 Checkpoint: Present the landscape summary to the user. Ask:
📚 Literature survey complete. Here's what I found:
- [key findings, gaps, open problems]
Does this match your understanding? Should I adjust the scope before generating ideas?
(If no response, I'll proceed with the top-ranked direction.)
- User approves (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with best direction.
- User requests changes (e.g., "focus more on X", "ignore Y", "too broad") → refine the search with updated queries, re-run
/research-litwith adjusted scope, and present again. Repeat until the user is satisfied.
Phase 2: Idea Generation + Filtering + Pilots
Invoke /idea-creator with the landscape context:
/idea-creator "$ARGUMENTS"
What this does:
- Brainstorm 8-12 concrete ideas via GPT-5.4 xhigh
- Filter by feasibility, compute cost, quick novelty search
- Deep validate top ideas (full novelty check + devil's advocate)
- Run parallel pilot experiments on available GPUs (top 2-3 ideas)
- Rank by empirical signal
- Output
IDEA_REPORT.md
🚦 Checkpoint: Present IDEA_REPORT.md ranked ideas to the user. Ask:
💡 Generated X ideas, filtered to Y, piloted Z. Top results:
1. [Idea 1] — Pilot: POSITIVE (+X%)
2. [Idea 2] — Pilot: WEAK POSITIVE (+Y%)
3. [Idea 3] — Pilot: NEGATIVE, eliminated
Which ideas should I validate further? Or should I regenerate with different constraints?
(If no response, I'll proceed with the top-ranked ideas.)
- User picks ideas (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with top-ranked ideas.
- User unhappy with all ideas → collect feedback ("what's missing?", "what direction do you prefer?"), update the prompt with user's constraints, and re-run Phase 2 (idea generation). Repeat until the user selects at least 1 idea.
- User wants to adjust scope → go back to Phase 1 with refined direction.
Phase 3: Deep Novelty Verification
For each top idea (positive pilot signal), run a thorough novelty check:
/novelty-check "[top idea 1 description]"
/novelty-check "[top idea 2 description]"
What this does:
- Multi-source literature search (arXiv, Scholar, Semantic Scholar)
- Cross-verify with GPT-5.4 xhigh
- Check for concurrent work (last 3-6 months)
- Identify closest existing work and differentiation points
Update IDEA_REPORT.md with deep novelty results. Eliminate any idea that turns out to be already published.
Phase 4: External Critical Review
For the surviving top idea(s), get brutal feedback:
/research-review "[top idea with hypothesis + pilot results]"
What this does:
- GPT-5.4 xhigh acts as a senior reviewer (NeurIPS/ICML level)
- Scores the idea, identifies weaknesses, suggests minimum viable improvements
- Provides concrete feedback on experimental design
Update IDEA_REPORT.md with reviewer feedback and revised plan.
Phase 4.5: Method Refinement + Experiment Planning
After review, refine the top idea into a concrete proposal and plan experiments:
/research-refine-pipeline "[top idea description + pilot results + reviewer feedback]"
What this does:
- Freeze a Problem Anchor to prevent scope drift
- Iteratively refine the method via GPT-5.4 review (up to 5 rounds, until score ≥ 9)
- Generate a claim-driven experiment roadmap with ablations, budgets, and run order
- Output:
refine-logs/FINAL_PROPOSAL.md,refine-logs/EXPERIMENT_PLAN.md,refine-logs/EXPERIMENT_TRACKER.md
🚦 Checkpoint: Present the refined proposal summary:
🔬 Method refined and experiment plan ready:
- Problem anchor: [anchored problem]
- Method thesis: [one sentence]
- Dominant contribution: [what's new]
- Must-run experiments: [N blocks]
- First 3 runs to launch: [list]
Proceed to implementation? Or adjust the proposal?
- User approves (or AUTO_PROCEED=true) → proceed to Final Report.
- User requests changes → pass feedback to
/research-refinefor another round. - Lite mode: If reviewer score < 6 or pilot was weak, run
/research-refineonly (skip/experiment-plan) and note remaining risks in the report.
Phase 5: Final Report
Finalize IDEA_REPORT.md with all accumulated information:
# Idea Discovery Report
**Direction**: $ARGUMENTS
**Date**: [today]
**Pipeline**: research-lit → idea-creator → novelty-check → research-review → research-refine-pipeline
## Executive Summary
[2-3 sentences: best idea, key evidence, recommended next step]
## Literature Landscape
[from Phase 1]
## Ranked Ideas
[from Phase 2, updated with Phase 3-4 results]
### 🏆 Idea 1: [title] — RECOMMENDED
- Pilot: POSITIVE (+X%)
- Novelty: CONFIRMED (closest: [paper], differentiation: [what's different])
- Reviewer score: X/10
- Next step: implement full experiment → /auto-review-loop
### Idea 2: [title] — BACKUP
...
## Eliminated Ideas
[ideas killed at each phase, with reasons]
## Refined Proposal
- Proposal: `refine-logs/FINAL_PROPOSAL.md`
- Experiment plan: `refine-logs/EXPERIMENT_PLAN.md`
- Tracker: `refine-logs/EXPERIMENT_TRACKER.md`
## Next Steps
- [ ] /run-experiment to deploy experiments from the plan
- [ ] /auto-review-loop to iterate until submission-ready
- [ ] Or invoke /research-pipeline for the complete end-to-end flow
Key Rules
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently. -
Don't skip phases. Each phase filters and validates — skipping leads to wasted effort later.
-
Checkpoint between phases. Briefly summarize what was found before moving on.
-
Kill ideas early. It's better to kill 10 bad ideas in Phase 3 than to implement one and fail.
-
Empirical signal > theoretical appeal. An idea with a positive pilot outranks a "sounds great" idea without evidence.
-
Document everything. Dead ends are just as valuable as successes for future reference.
-
Be honest with the reviewer. Include negative results and failed pilots in the review prompt.
-
Feishu notifications are optional. If
~/.codex/feishu.jsonexists, sendcheckpointat each phase transition andpipeline_doneat final report. If absent/off, skip silently.
Composing with Workflow 2
After this pipeline produces a validated top idea:
/idea-discovery "direction" ← you are here (Workflow 1, includes method refinement + experiment planning)
/run-experiment ← deploy experiments from the plan
/auto-review-loop "top idea" ← Workflow 2: iterate until submission-ready
Or use /research-pipeline for the full end-to-end flow.
> related_skills --same-repo
> training-check
Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.
> result-to-claim
Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. Codex MCP evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.
> paper-slides
Generate conference presentation slides (beamer LaTeX → PDF + editable PPTX) from a compiled paper, with speaker notes and full talk script. Use when user says "做PPT", "做幻灯片", "make slides", "conference talk", "presentation slides", "生成slides", "写演讲稿", or wants beamer slides for a conference talk.
> paper-poster
Generate a conference poster (article + tcbposter LaTeX → A0/A1 PDF + editable PPTX + SVG) from a compiled paper. Use when user says "做海报", "制作海报", "conference poster", "make poster", "生成poster", "poster session", or wants to create a poster for a conference presentation.