RL training of Llama 3.1 8b on multi-gpus #78

AlexPiche · 2024-10-31T02:42:45Z

Training Llama 3.1 8b on multi-gpus

The goal of this PR is to train Llama 3.1 70b on GSM8k in an efficient manner.

Launch accelerate job as a subprocess. better multi-gpu inference and training in the RL example #76
Fix the NCCL timeout when training on multiple gpus.
Implicit KL like Eq3 https://arxiv.org/abs/2402.14740
Optimize get log probs OOM when getting logprobs from vLLM reference model #77

rizar · 2024-10-31T12:39:36Z

will address #77 and #76

Browse math tapes

Simple rl gsm8k

run 70b

0044902

AlexPiche changed the base branch from main to grpo_wild_chat October 31, 2024 02:42

AlexPiche added 3 commits October 31, 2024 02:45

clean up debug code

09f7fb2

conf_dir as str

0537e02

chunked prefill vllm

4b9407a

Base automatically changed from grpo_wild_chat to main October 31, 2024 13:28

AlexPiche added 23 commits October 31, 2024 14:08

iteration as str

04279b0

no testing when test_every_n_iterations is -1

537f9a4

no testing with -1

819a161

print finetuning output

b772d97

rm seq length tokens

4317fcb

fp8 quant

d781b74

use deepspeed

8657767

better vllm logging

d050cf4

deepspeed training

6b0bc26

update accelerate version

2a3f0ee

negative rewards for too many steps

1503884

discounting

5600f85

try to get lora working

743a3e6

step norm

5e11b32

penalize 20 steps tape

b42a785

step discount

e8872c3

discoutn and max steps

7be219b

fix discount typo

f46c0ea

merge 8b branch

5d87a2a

use adafactor with accelerate

4e79c5f

implicit kl

1dfe78d

implicit kl

7a2961c

better variable names

c737e72

rizar and others added 30 commits November 8, 2024 15:40

Merge branch 'llama70b_gsm8k' into browse_math_tapes

43b7636

changeable port for tape_browser

127c7f3

tape browser launch script and json gathering script

631ebb0

better way to save tapes

961a429

cute little renaming

98656ea

Merge pull request #92 from ServiceNow/browse_math_tapes

3af4b2b

Browse math tapes

simple rl gsm8k

056592c

produce more tokens

8605fe5

icnrease max tokens

7ecc197

clena up agent architecture

196c9b4

simpler agent

bb96ae8

rm loop

7499c8f

clean string

ac02145

hack simple agent

c241214

improve simple agent

000ade0

clean up string

9ce7a5b

2000 tokens

64ac725

better logging

3379173

fix browser to read new tapes

294d2bf

assert its the right completion

54dc1b7

rm assert

73ad382

relu weights

f8e24d8

klcoef for reinforce

a30dbc6

revert to adam

bbabd3e

typo

4cfa51e

rm bos token

53ac342

clean up nodes

7e903c0

clipping loss to 0

a795ac1

better handling of bos token

4854cdc

Merge pull request #101 from ServiceNow/simple_rl_gsm8k

e0f8203

Simple rl gsm8k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL training of Llama 3.1 8b on multi-gpus #78

RL training of Llama 3.1 8b on multi-gpus #78

AlexPiche commented Oct 31, 2024 •

edited

Loading

rizar commented Oct 31, 2024

RL training of Llama 3.1 8b on multi-gpus #78

Are you sure you want to change the base?

RL training of Llama 3.1 8b on multi-gpus #78

Conversation

AlexPiche commented Oct 31, 2024 • edited Loading

Training Llama 3.1 8b on multi-gpus

rizar commented Oct 31, 2024

AlexPiche commented Oct 31, 2024 •

edited

Loading