"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

shenbb · 2024-06-18T02:29:44Z

python3 train.py config/train_shakespeare_char.py

Overriding config with config/train_shakespeare_char.py:

train a miniature character-level shakespeare model

good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

we expect to overfit on this small dataset, so only save when val improves

always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

baby GPT model :)

n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

on macbook also add

device = 'cpu' # run on cpu only

compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
Traceback (most recent call last):
File "train.py", line 264, in
losses = estimate_loss()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "train.py", line 224, in estimate_loss
logits, loss = model(X, Y)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame
result = inner_convert(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
return _compile(
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
out_code = transform_code_object(code, transform)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform
tracer.run()
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
super().run()
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE
self.output.compile_subgraph(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1001, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1178, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1251, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/usr/local/lib/python3.8/dist-packages/torch/dynamo/output_graph.py", line 1232, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/usr/local/lib/python3.8/dist-packages/torch/dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/init.py", line 1731, in call
return compile_fx(model, inputs, config_patches=self.config)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx
return aot_autograd(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base
return inner_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn
return self.compile_to_module().call
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 1254, in compile_to_module
mod = PyCodeCache.load_by_key_path(
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2160, in load_by_key_path
exec(code, mod.dict, mod.dict)
File "/tmp/torchinductor_libra/6z/c6zptqfvl4uwgoca6tk4qimwczeni4sq2plv5hxtx7vncbopqccc.py", line 1162, in
async_compile.wait(globals())
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2715, in wait
scope[key] = result.result()
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2522, in result
self.future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-863569, line 636; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 636; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 638; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 638; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 640; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 640; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 642; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 642; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

mishra011 · 2024-06-19T06:41:04Z

Facing same issue while python train.py config/train_shakespeare_char.py on colab gpu

Elrashid · 2024-06-21T18:17:09Z

@mishra011 @shenbb

I tried running the following code:

!pip install tiktoken
!git clone https://github.com/karpathy/nanoGPT.git
%cd nanoGPT
!python train.py config/train_shakespeare_char.py
!python sample.py --out_dir=out-shakespeare-char

Found a GPU Compatibility Issue:

A100-SXM4-40GB: Works well and can handle the training without errors.
V100-SXM2-16GB: May not be sufficient and can result in the Triton PTX codegen error.

If you're using the standard V100-SXM2-16GB GPU, you might face compatibility issues due to the limited memory and capabilities required by the model.

Recommendation (this worked for me, didn't have time to dig down further):

To avoid this error, upgrade to Colab Pro and ensure you select the A100-SXM4-40GB GPU in your runtime settings. This should resolve the issue and allow your model to train successfully.

lichengshen · 2024-06-25T08:50:44Z

The issue comes from torch.compile() so you can pass --compile=False as a workaround.

xinge449 · 2024-06-27T18:47:34Z

The issue comes from torch.compile() so you can pass --compile=False as a workaround.

在我这边成功了，非常感谢，Thinks a lot

lise-brinck · 2024-08-13T06:15:15Z

I am facing the same issue on a T4 GPU:

E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Error in subprocess
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] concurrent.futures.process._RemoteTraceback:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] """
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Traceback (most recent call last):
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/compiler.py", line 292, in make_cubin
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     subprocess.run(cmd, shell=True, check=True)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/subprocess.py", line 528, in run
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     raise CalledProcessError(retcode, process.args,
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] subprocess.CalledProcessError: Command '/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_75 /tmp/tmpjp5bgu11.ptx -o /tmp/tmpjp5bgu11.ptx.o 2> /tmp/tmp1ew0x1sq.log' returned non-zero exit status 255.
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] During handling of the above exception, another exception occurred:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Traceback (most recent call last):
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     r = call_item.fn(*call_item.args, **call_item.kwargs)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 218, in do_job
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     result = job()
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/runtime/compile_tasks.py", line 69, in _worker_compile_triton
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     load_kernel().precompile(warm_cache_only=True)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 232, in precompile
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     compiled_binary, launcher = self._precompile_config(
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 416, in _precompile_config
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     triton.compile(*compile_args, **compile_kwargs),
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/compiler/compiler.py", line 282, in compile
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     next_module = compile_ir(module, metadata)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/compiler.py", line 320, in <lambda>
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/compiler.py", line 297, in make_cubin
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     raise RuntimeError(f'Internal Triton PTX codegen error: \n{log}')
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] RuntimeError: Internal Triton PTX codegen error:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas fatal   : Ptx assembly aborted due to errors
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] """
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] The above exception was the direct cause of the following exception:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Traceback (most recent call last):
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 203, in callback
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     result = future.result()
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/concurrent/futures/_base.py", line 439, in result
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     return self.__get_result()
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     raise self._exception
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] RuntimeError: Internal Triton PTX codegen error:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas fatal   : Ptx assembly aborted due to errors

Pytorch version: 2.4.0+cu118
nvcc version:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Output of nvidia-smi:

Tue Aug 13 06:11:55 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8    14W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

shenbb commented Jun 18, 2024

mishra011 commented Jun 19, 2024

Elrashid commented Jun 21, 2024

lichengshen commented Jun 25, 2024

xinge449 commented Jun 27, 2024

lise-brinck commented Aug 13, 2024

"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

Comments

shenbb commented Jun 18, 2024

train a miniature character-level shakespeare model

good for debugging and playing on macbooks and such

we expect to overfit on this small dataset, so only save when val improves

baby GPT model :)

on macbook also add

device = 'cpu' # run on cpu only

compile = False # do not torch compile the model

mishra011 commented Jun 19, 2024

Elrashid commented Jun 21, 2024

Found a GPU Compatibility Issue:

Recommendation (this worked for me, didn't have time to dig down further):

lichengshen commented Jun 25, 2024

xinge449 commented Jun 27, 2024

lise-brinck commented Aug 13, 2024