> browse/registry

--stars --downloads --recent

found 4 skills in registry

> grpo-rl-training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

#Post-Training#Reinforcement Learning#GRPO

Orchestra-Research

> openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

#Post-Training#OpenRLHF#RLHF

Orchestra-Research

> verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

#Reinforcement Learning#RLHF#GRPO

Orchestra-Research

> fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

#Post-Training#TRL#Reinforcement Learning

Orchestra-Research

$ loading registry▋