> experiment-bridge

Workflow 1.5: Bridge between idea discovery and auto review. Reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. Use when user says "实现实验", "implement experiments", "bridge", "从计划到跑实验", "deploy the plan", or has an experiment plan ready to execute.

fetch

$curl "https://skillshub.wtf/wanshuiyin/Auto-claude-code-research-in-sleep/experiment-bridge?format=md"

SKILL.md•experiment-bridge

Workflow 1.5: Experiment Bridge

Implement and deploy experiments from plan: $ARGUMENTS

Overview

This skill bridges Workflow 1 (idea discovery + method refinement) and Workflow 2 (auto review loop). It takes the experiment plan and turns it into running experiments with initial results.

Workflow 1 output:                    This skill:                        Workflow 2 input:
refine-logs/EXPERIMENT_PLAN.md   →   implement → deploy → collect   →   initial results ready
refine-logs/EXPERIMENT_TRACKER.md     code        /run-experiment        for /auto-review-loop
refine-logs/FINAL_PROPOSAL.md

Constants

AUTO_DEPLOY = true — Automatically deploy experiments after implementation. Set false to review code before deploying.
SANITY_FIRST = true — Run the sanity-stage experiment first (smallest, fastest) before launching the rest. Catches setup bugs early.
MAX_PARALLEL_RUNS = 4 — Maximum number of experiments to deploy in parallel (limited by available GPUs).

Override: /experiment-bridge "EXPERIMENT_PLAN.md" — auto deploy: false, max parallel: 2

Inputs

This skill expects one or more of:

refine-logs/EXPERIMENT_PLAN.md (best) — claim-driven experiment roadmap from /experiment-plan
refine-logs/EXPERIMENT_TRACKER.md — run-by-run execution table
refine-logs/FINAL_PROPOSAL.md — method description for implementation context
IDEA_REPORT.md — fallback if refine-logs don't exist

If none exist, ask the user what experiments to implement.

Workflow

Phase 1: Parse the Experiment Plan

Read EXPERIMENT_PLAN.md and extract:

Run order and milestones — which experiments run first (sanity → baseline → main → ablation → polish)
For each experiment block:
- Dataset / split / task
- Compared systems and variants
- Metrics to compute
- Setup details (backbone, hyperparameters, seeds)
- Success criterion
- Priority (MUST-RUN vs NICE-TO-HAVE)
Compute budget — total estimated GPU-hours
Method details from FINAL_PROPOSAL.md — what exactly to implement

Present a brief summary:

📋 Experiment plan loaded:
- Milestones: [N] (sanity → baseline → main → ablation)
- Must-run experiments: [N]
- Nice-to-have: [N]
- Estimated GPU-hours: [X]

Proceeding to implementation.

Phase 2: Implement Experiment Code

For each milestone (in order), write the experiment scripts:

Check existing code — scan the project for existing experiment scripts, model code, data loaders. Reuse as much as possible.
Implement missing pieces:
- Training scripts with proper argparse (all hyperparameters configurable)
- Evaluation scripts computing the specified metrics
- Data loading / preprocessing if needed
- Baseline implementations if not already present
- Fixed random seeds for reproducibility
- Results saved to JSON/CSV for later analysis
- Proper logging (wandb if configured in AGENTS.md)
Follow the plan's run order — implement sanity-stage experiments first, then baselines, then main method, then ablations.
Self-review before deploying:
- Are all hyperparameters from EXPERIMENT_PLAN.md reflected in argparse?
- Is the random seed fixed and controllable?
- Are results saved in a parseable format (JSON/CSV)?
- Does the code match FINAL_PROPOSAL.md's method description?

Phase 3: Sanity Check (if SANITY_FIRST = true)

Before deploying the full experiment suite, run the sanity-stage experiment:

/run-experiment [sanity experiment command]

Wait for completion. Verify:

Training loop runs without errors
Metrics are computed and saved correctly
GPU memory usage is within bounds
Output format matches expectations

If sanity fails → fix the code, re-run. Do not proceed to full deployment with broken code.

Phase 4: Deploy Full Experiments

Deploy experiments following the plan's milestone order:

/run-experiment [experiment commands]

For each milestone:

Deploy experiments in parallel (up to MAX_PARALLEL_RUNS)
Use /monitor-experiment to track progress
Collect results as experiments complete

🚦 Checkpoint (if AUTO_DEPLOY = false):

🔧 Code implementation complete. Ready to deploy:

Milestone 0 (sanity): [status — passed/pending]
Milestone 1 (baseline): [N experiments, ~X GPU-hours]
Milestone 2 (main method): [N experiments, ~X GPU-hours]
Milestone 3 (ablations): [N experiments, ~X GPU-hours]

Total estimated: ~X GPU-hours on [N] GPUs

Deploy now? Or review the code first?

Phase 5: Collect Initial Results

As experiments complete:

Parse output files (JSON/CSV/logs) for key metrics
Update refine-logs/EXPERIMENT_TRACKER.md — fill in Status and Notes columns
Check success criteria from EXPERIMENT_PLAN.md — did each experiment meet its bar?
Write initial results summary:

# Initial Experiment Results

**Date**: [today]
**Plan**: refine-logs/EXPERIMENT_PLAN.md

## Results by Milestone

### M0: Sanity — PASSED
- [result]

### M1: Baselines
| Run | System | Key Metric | Status |
|-----|--------|-----------|--------|
| R001 | baseline_1 | X.XX | DONE |

### M2: Main Method
| Run | System | Key Metric | Status |
|-----|--------|-----------|--------|
| R003 | our_method | X.XX | DONE |

### M3: Ablations
...

## Summary
- [X/Y] must-run experiments completed
- Main result: [positive/negative/inconclusive]
- Ready for /auto-review-loop: [YES/NO]

## Next Step
→ /auto-review-loop "[topic]"

Phase 6: Handoff

Present final status:

🔬 Experiment bridge complete:
- Implemented: [N] experiment scripts
- Deployed: [N] experiments on [M] GPUs
- Completed: [X/Y] must-run, [A/B] nice-to-have
- Main result: [one sentence]

Results: refine-logs/EXPERIMENT_RESULTS.md
Tracker: refine-logs/EXPERIMENT_TRACKER.md

Ready for Workflow 2:
→ /auto-review-loop "[topic]"

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Follow the plan. Do not invent experiments not in EXPERIMENT_PLAN.md. If you think something is missing, note it but don't add it.
Sanity first. Never deploy a full suite without verifying the sanity stage passes.
Reuse existing code. Scan the project before writing new scripts. Extend, don't duplicate.
Save everything as JSON/CSV. The auto-review-loop needs parseable results, not just terminal output.
Update the tracker. EXPERIMENT_TRACKER.md should reflect real status after each run completes.
Don't wait forever. If an experiment exceeds 2x its estimated time, flag it and move on to the next milestone.
Budget awareness. Track GPU-hours against the plan's budget. Warn if approaching the limit.

Composing with Other Skills

/idea-discovery "direction"          ← Workflow 1: find + refine + plan
/experiment-bridge                   ← you are here (Workflow 1.5: implement + deploy)
/auto-review-loop "topic"            ← Workflow 2: review + iterate
/paper-writing "NARRATIVE_REPORT.md" ← Workflow 3: write the paper

Or use /research-pipeline for the full end-to-end flow (includes this bridge).

> related_skills --same-repo

> training-check

Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.

> result-to-claim

Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. Codex MCP evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.

> paper-slides

Generate conference presentation slides (beamer LaTeX → PDF + editable PPTX) from a compiled paper, with speaker notes and full talk script. Use when user says "做PPT", "做幻灯片", "make slides", "conference talk", "presentation slides", "生成slides", "写演讲稿", or wants beamer slides for a conference talk.

> paper-poster

Generate a conference poster (article + tcbposter LaTeX → A0/A1 PDF + editable PPTX + SVG) from a compiled paper. Use when user says "做海报", "制作海报", "conference poster", "make poster", "生成poster", "poster session", or wants to create a poster for a conference presentation.

┌ stats

installs/wk0

░░░░░░░░░░

github stars12.5K

██████████

first seenMar 23, 2026

└────────────

┌ repo

wanshuiyin/Auto-claude-code-research-in-sleep

by wanshuiyin

└────────────