Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI Abort Error when using disaggServerBenchmark #2518

Open
2 of 4 tasks
zhangts20 opened this issue Dec 2, 2024 · 10 comments
Open
2 of 4 tasks

MPI Abort Error when using disaggServerBenchmark #2518

zhangts20 opened this issue Dec 2, 2024 · 10 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@zhangts20
Copy link

System Info

  • CPU: x86_64 (Ubuntu 20.04.6 LTS)
  • GPU: H100 * 8
  • CUDA: 12.5.1
  • TensorRT-LLM: The latest dev commit, 3856265
  • TensorRT: 10.6.0
  • Python: 3.10.14
  • Pytorch: 2.5.0

Who can help?

@ncomly-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Build tensorrt_llm: python scripts/build_wheel.py --trt_root=/usr/local/tensorrt --clean --cuda_architectures='90-real' --benchmarks
  2. Do according to https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/cpp#4launch-c-disaggserverbenchmark by building a llama2-7b-tp1 and a llama2-7b-tp2 with default build args
  3. mpirun -n 7 disaggServerBenchmark --context_engine_dirs /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2 --generation_engine_dirs /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2

Expected behavior

success

actual behavior

[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Will Launch benchmark with 2 context engines and 2 generation engines. Context Engines:/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2, ; Generation Engines:/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2, ;
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[40a8c9673b05:1334630] Read -1, expected 16777216, errno = 14
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[40a8c9673b05:1334629] Read -1, expected 33554432, errno = 14
[40a8c9673b05:1334631] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1334621] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[40a8c9673b05:1334621] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

additional notes

Thanks for your attention!

@zhangts20 zhangts20 added the bug Something isn't working label Dec 2, 2024
@chuangz0
Copy link
Collaborator

chuangz0 commented Dec 2, 2024

please pass --dataset to the commands and verify your engine by gptManagerBenchmark first.

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Dec 2, 2024
@zhangts20
Copy link
Author

please pass --dataset to the commands and verify your engine by gptManagerBenchmark first.

I have pass --dataset to disaggServerBenchmark, and the llama2-7b-tp1 and llama2-7b-tp2 of gptManagerBenchmark is ok.

@chuangz0
Copy link
Collaborator

chuangz0 commented Dec 2, 2024

Could you
comment out

        for (int sig : {SIGABRT, SIGSEGV})
        {
            __sighandler_t previousHandler = nullptr;
            if (forwardAbortToParent)
            {
                previousHandler = std::signal(sig,
                    [](int signal)
                    {
#ifndef _WIN32
                        pid_t parentProcessId = getppid();
                        kill(parentProcessId, SIGKILL);
#endif
                        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
                    });
            }
            else
            {
                previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
            }
            TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
        }
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .

Which container do you use?

@zhangts20
Copy link
Author

Could you comment out

        for (int sig : {SIGABRT, SIGSEGV})
        {
            __sighandler_t previousHandler = nullptr;
            if (forwardAbortToParent)
            {
                previousHandler = std::signal(sig,
                    [](int signal)
                    {
#ifndef _WIN32
                        pid_t parentProcessId = getppid();
                        kill(parentProcessId, SIGKILL);
#endif
                        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
                    });
            }
            else
            {
                previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
            }
            TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
        }
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .

Which container do you use?

Thanks, I will try it. I install tensorrt_llm from source in my own container, and the env info is as mentioned above.

@chuangz0
Copy link
Collaborator

chuangz0 commented Dec 3, 2024

Maybe you can try in the docker image built with instruction https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html#building-a-tensorrt-llm-docker-image
We only have tested disaggServer in docker image base on nvcr.io/nvidia/pytorch:24.10-py3.

@zhangts20
Copy link
Author

Could you comment out

        for (int sig : {SIGABRT, SIGSEGV})
        {
            __sighandler_t previousHandler = nullptr;
            if (forwardAbortToParent)
            {
                previousHandler = std::signal(sig,
                    [](int signal)
                    {
#ifndef _WIN32
                        pid_t parentProcessId = getppid();
                        kill(parentProcessId, SIGKILL);
#endif
                        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
                    });
            }
            else
            {
                previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
            }
            TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
        }
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .

Which container do you use?

I rebuild tensorrt_llm, and now the error like this (There have an error about permissions):

[40a8c9673b05:1447853] Read -1, expected 33554432, errno = 14
[40a8c9673b05:1447850] *** Process received signal ***
[40a8c9673b05:1447850] Signal: Segmentation fault (11)
[40a8c9673b05:1447850] Signal code: Invalid permissions (2)
[40a8c9673b05:1447850] Failing at address: 0x9c5c12400
[40a8c9673b05:1447850] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f39b8619520]
[40a8c9673b05:1447850] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7f39b877d7cd]
[40a8c9673b05:1447850] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7f39600ec244]
[40a8c9673b05:1447850] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7f3960048556]
[40a8c9673b05:1447850] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7f3960046811]
[40a8c9673b05:1447850] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f39600f0ae5]
[40a8c9673b05:1447850] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7e24)[0x7f39600f0e24]
[40a8c9673b05:1447850] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f39b8a3f714]
[40a8c9673b05:1447850] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7f39b8a4c38d]                                                                                                                                 [40a8c9673b05:1447850] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_mprobe+0x52d)[0x7f39600432fd]
[40a8c9673b05:1447850] [10] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Mprobe+0xd7)[0x7f39b8b440e7]
[40a8c9673b05:1447850] [11] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm6mprobeEiiPP14ompi_message_tP20ompi_status_public_t+0x2a)[0x7f39be09be6a]
[40a8c9673b05:1447850] [12] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl19leaderRecvReqThreadEv+0x133)[0x7f39c03c4e23]
[40a8c9673b05:1447850] [13] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7f39bbee793
0]
[40a8c9673b05:1447850] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f39b866bac3]                                                                                                                                                       [40a8c9673b05:1447850] [15] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f39b86fd850]
[40a8c9673b05:1447850] *** End of error message ***
[40a8c9673b05:1447854] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1447851] *** Process received signal ***
[40a8c9673b05:1447851] Signal: Segmentation fault (11)
[40a8c9673b05:1447851] Signal code: Invalid permissions (2)
[40a8c9673b05:1447851] Failing at address: 0x9a2d12600
[40a8c9673b05:1447851] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fb0f5419520]
[40a8c9673b05:1447851] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7fb0f557d7cd]
[40a8c9673b05:1447851] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7fb09c30a244]
[40a8c9673b05:1447851] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7fb09c165556]
[40a8c9673b05:1447851] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7fb09c163811]
[40a8c9673b05:1447851] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fb09c30eae5]
[40a8c9673b05:1447851] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7e24)[0x7fb09c30ee24]
[40a8c9673b05:1447851] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fb0f583f714]
[40a8c9673b05:1447851] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7fb0f584c38d]
[40a8c9673b05:1447851] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_mprobe+0x52d)[0x7fb09c1602fd]
[40a8c9673b05:1447851] [10] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Mprobe+0xd7)[0x7fb0f59440e7]
[40a8c9673b05:1447851] [11] [40a8c9673b05:1447855] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1447852] *** Process received signal ***
[40a8c9673b05:1447852] Signal: Segmentation fault (11)
[40a8c9673b05:1447852] Signal code: Invalid permissions (2)
[40a8c9673b05:1447852] Failing at address: 0x9a2d12600
[40a8c9673b05:1447852] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc1fee19520]
[40a8c9673b05:1447852] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7fc1fef7d7cd]
[40a8c9673b05:1447852] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7fc1a598e244]
[40a8c9673b05:1447852] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7fc1a53e8556]
[40a8c9673b05:1447852] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7fc1a53e6811]
[40a8c9673b05:1447852] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fc1a5992ae5]
[40a8c9673b05:1447852] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7db1)[0x7fc1a5992db1]
[40a8c9673b05:1447852] [ 7] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm6mprobeEiiPP14ompi_message_tP20ompi_status_public_t+0x2a)[0x7fb0fae9be6a]
[40a8c9673b05:1447851] [12] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fc1ff23f714]
[40a8c9673b05:1447852] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7fc1ff24c38d]
[40a8c9673b05:1447852] [ 9] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x24b)[0x7fc1ff3192db]
[40a8c9673b05:1447852] [10] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_bcast_intra_generic+0x5ea)[0x7fc1ff36d40a]
[40a8c9673b05:1447852] [11] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_bcast_intra_pipeline+0xd1)[0x7fc1ff36e6c1]
[40a8c9673b05:1447852] [12] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0x40)[0x7fc1a534b640]
[40a8c9673b05:1447852] [13] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Bcast+0x121)[0x7fc1ff32d881]
[40a8c9673b05:1447852] [14] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl19leaderRecvReqThreadEv+0x133)[0x7fb0fd1c4e23]
[40a8c9673b05:1447851] [13] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm5bcastEPvmNS0_7MpiTypeEi+0x47)[0x7fc20489d7b7]
[40a8c9673b05:1447852] [15] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl16getNewReqWithIdsEiSt8optionalIfE+0x68b)[0x7fc206bb787b]
[40a8c9673b05:1447852] [16] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7fb0f8ce793
0]
[40a8c9673b05:1447851] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fb0f546bac3]
[40a8c9673b05:1447851] [15] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl16fetchNewRequestsEiSt8optionalIfE+0x59)[0x7fc206bc5949]
[40a8c9673b05:1447852] [17] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7fb0f54fd850]
[40a8c9673b05:1447851] *** End of error message ***
/xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl13executionLoopEv+0x3bd)[0x7fc206bc7f5d]
[40a8c9673b05:1447852] [18] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7fc2026e793
0]
[40a8c9673b05:1447852] [19] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fc1fee6bac3]
[40a8c9673b05:1447852] [20] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7fc1feefd850]
[40a8c9673b05:1447852] *** End of error message ***

@chuangz0
Copy link
Collaborator

chuangz0 commented Dec 3, 2024

Maybe you can't start the trtllm executor in orchestrator mode in your container environment.
Could you run executorExampleAdvanced in examples/cpp/executor with orchestartor mode?
If your mpi is based on UCX,please set env UCX_MEMTYPE_CACHE=n
Please make sure your mpi enable cuda aware.
I highly recommend using and docker image based nvcr.io/nvidia/pytorch:24.10-py3.

@zhangts20
Copy link
Author

Maybe you can't start the trtllm executor in orchestrator mode in your container environment. Could you run executorExampleAdvanced in examples/cpp/executor with orchestartor mode? If your mpi is based on UCX,please set env UCX_MEMTYPE_CACHE=n Please make sure your mpi enable cuda aware. I highly recommend using and docker image based nvcr.io/nvidia/pytorch:24.10-py3.

Thanks, I have executed executorExampleAdvanced successfully.

./build/executorExampleAdvanced --engine_dir /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1 --input_tokens_csv_file ./inputTokens.csv --use_orchestrator_mode --worker_executable_path ../../../cpp/build/tensorrt_llm/executor_worker/executorWorker

The log of output:

[TensorRT-LLM][INFO] Engine version 0.16.0.dev2024112600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Engine version 0.16.0.dev2024112600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0                                                                                                                                                                                [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 2048
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 2048
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 2048
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (2048) * 32
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 2047 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens).
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Loaded engine size: 12869 MiB
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1112.01 MiB for execution context memory.
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 12853 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 346.17 MB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1.16 GB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 79.11 GiB, available: 26.40 GiB
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 761
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 32
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 23.78 GiB for max tokens in paged KV cache (48704).
[TensorRT-LLM][INFO] Enable MPI KV cache transport.
[TensorRT-LLM][INFO] Executor instance created by worker
[TensorRT-LLM][INFO] Reading input tokens from ./inputTokens.csv
[TensorRT-LLM][INFO] Number of requests: 3
[TensorRT-LLM][INFO] Creating request with 6 input tokens
[TensorRT-LLM][INFO] Creating request with 4 input tokens
[TensorRT-LLM][INFO] Creating request with 10 input tokens
[TensorRT-LLM][INFO] Got 20 tokens for seqIdx 0 for requestId 3
[TensorRT-LLM][INFO] Request id 3 is completed.
[TensorRT-LLM][INFO] Got 14 tokens for seqIdx 0 for requestId 2
[TensorRT-LLM][INFO] Request id 2 is completed.
[TensorRT-LLM][INFO] Got 16 tokens for seqIdx 0 for requestId 1
[TensorRT-LLM][INFO] Request id 1 is completed.
[TensorRT-LLM][INFO] Writing output tokens to outputTokens.csv
[TensorRT-LLM][INFO] Exiting.
[TensorRT-LLM][INFO] Orchestrator sendReq thread exiting
[TensorRT-LLM][INFO] Orchestrator recv thread exiting
[TensorRT-LLM][INFO] Leader recvReq thread exiting
[TensorRT-LLM][INFO] Leader sendThread exiting
[TensorRT-LLM][INFO] Refreshed the MPI local session

@chuangz0
Copy link
Collaborator

chuangz0 commented Dec 3, 2024

Having trouble using nvcr.io/nvidia/pytorch:24.10-py3 -based containers?

@zhangts20
Copy link
Author

Having trouble using nvcr.io/nvidia/pytorch:24.10-py3 -based containers?

I will try it later, it seems my current env can use orchestrator mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants