Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

Open
shenbb opened this issue Jun 18, 2024 · 5 comments

Comments

@shenbb
Copy link

shenbb commented Jun 18, 2024

python3 train.py config/train_shakespeare_char.py

Overriding config with config/train_shakespeare_char.py:

train a miniature character-level shakespeare model

good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

we expect to overfit on this small dataset, so only save when val improves

always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

baby GPT model :)

n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

on macbook also add

device = 'cpu' # run on cpu only

compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
Traceback (most recent call last):
File "train.py", line 264, in
losses = estimate_loss()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "train.py", line 224, in estimate_loss
logits, loss = model(X, Y)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame
result = inner_convert(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
return _compile(
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
out_code = transform_code_object(code, transform)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform
tracer.run()
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
super().run()
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE
self.output.compile_subgraph(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1001, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1178, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1251, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/usr/local/lib/python3.8/dist-packages/torch/dynamo/output_graph.py", line 1232, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/usr/local/lib/python3.8/dist-packages/torch/dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/init.py", line 1731, in call
return compile_fx(model
, inputs
, config_patches=self.config)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx
return aot_autograd(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base
return inner_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn
return self.compile_to_module().call
File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 1254, in compile_to_module
mod = PyCodeCache.load_by_key_path(
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2160, in load_by_key_path
exec(code, mod.dict, mod.dict)
File "/tmp/torchinductor_libra/6z/c6zptqfvl4uwgoca6tk4qimwczeni4sq2plv5hxtx7vncbopqccc.py", line 1162, in
async_compile.wait(globals())
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2715, in wait
scope[key] = result.result()
File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2522, in result
self.future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-863569, line 636; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 636; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 638; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 638; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 640; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 640; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 642; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-863569, line 642; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

@mishra011
Copy link

Facing same issue while python train.py config/train_shakespeare_char.py on colab gpu

@Elrashid
Copy link

@mishra011 @shenbb

I tried running the following code:

!pip install tiktoken
!git clone https://github.com/karpathy/nanoGPT.git
%cd nanoGPT
!python train.py config/train_shakespeare_char.py
!python sample.py --out_dir=out-shakespeare-char

Found a GPU Compatibility Issue:

  • A100-SXM4-40GB: Works well and can handle the training without errors.
  • V100-SXM2-16GB: May not be sufficient and can result in the Triton PTX codegen error.

If you're using the standard V100-SXM2-16GB GPU, you might face compatibility issues due to the limited memory and capabilities required by the model.

Recommendation (this worked for me, didn't have time to dig down further):

To avoid this error, upgrade to Colab Pro and ensure you select the A100-SXM4-40GB GPU in your runtime settings. This should resolve the issue and allow your model to train successfully.

@lichengshen
Copy link

The issue comes from torch.compile() so you can pass --compile=False as a workaround.

@xinge449
Copy link

The issue comes from torch.compile() so you can pass --compile=False as a workaround.

在我这边成功了,非常感谢,Thinks a lot

@lise-brinck
Copy link

I am facing the same issue on a T4 GPU:

E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Error in subprocess
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] concurrent.futures.process._RemoteTraceback:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] """
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Traceback (most recent call last):
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/compiler.py", line 292, in make_cubin
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     subprocess.run(cmd, shell=True, check=True)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/subprocess.py", line 528, in run
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     raise CalledProcessError(retcode, process.args,
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] subprocess.CalledProcessError: Command '/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_75 /tmp/tmpjp5bgu11.ptx -o /tmp/tmpjp5bgu11.ptx.o 2> /tmp/tmp1ew0x1sq.log' returned non-zero exit status 255.
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] During handling of the above exception, another exception occurred:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Traceback (most recent call last):
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     r = call_item.fn(*call_item.args, **call_item.kwargs)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 218, in do_job
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     result = job()
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/runtime/compile_tasks.py", line 69, in _worker_compile_triton
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     load_kernel().precompile(warm_cache_only=True)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 232, in precompile
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     compiled_binary, launcher = self._precompile_config(
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 416, in _precompile_config
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     triton.compile(*compile_args, **compile_kwargs),
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/compiler/compiler.py", line 282, in compile
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     next_module = compile_ir(module, metadata)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/compiler.py", line 320, in <lambda>
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/triton/backends/nvidia/compiler.py", line 297, in make_cubin
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     raise RuntimeError(f'Internal Triton PTX codegen error: \n{log}')
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] RuntimeError: Internal Triton PTX codegen error:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas fatal   : Ptx assembly aborted due to errors
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] """
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] The above exception was the direct cause of the following exception:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] Traceback (most recent call last):
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/nanoGPT/.venv/lib/python3.9/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 203, in callback
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     result = future.result()
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/concurrent/futures/_base.py", line 439, in result
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     return self.__get_result()
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]   File "/home/ubuntu/.pyenv/versions/3.9.17/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205]     raise self._exception
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] RuntimeError: Internal Triton PTX codegen error:
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 66; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 69; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 72; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 75; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 78; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 81; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 84; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature '.bf16' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas /tmp/tmpjp5bgu11.ptx, line 87; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
E0813 06:09:49.845341 140336920782592 torch/_inductor/compile_worker/subproc_pool.py:205] ptxas fatal   : Ptx assembly aborted due to errors

Pytorch version: 2.4.0+cu118
nvcc version:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Output of nvidia-smi:

Tue Aug 13 06:11:55 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8    14W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants